Re: [PR] [HUDI-9505] Hudi 1.1 blocker code change of index look up [hudi]

via GitHub Wed, 25 Jun 2025 18:02:17 -0700


vinothchandar commented on code in PR #13414:
URL: https://github.com/apache/hudi/pull/13414#discussion_r2167884338



##########
hudi-common/src/main/java/org/apache/hudi/common/data/HoodiePairData.java:
##########
@@ -153,6 +154,23 @@ <L, W> HoodiePairData<L, W> mapToPair(
    */
   List<Pair<K, V>> collectAsList();
 
+  /**
+   * Collects results of the underlying collection into a {@link Map<Pair<K, 
V>>}
+   * If there are multiple pairs sharing the same key, the resulting map 
randomly picks one among them.

Review Comment:
   we want to keep the `Data` abstractions simpler, close to what spark RDD 
will provide.. SO if you have custom processing logic to munge the data like 
the de-duping, lets keep it outside of the Data abstraction. Else it will 
become harder to maintain. 
   
   I am suggesting we do this logic outside in the caller sites



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9505] Hudi 1.1 blocker code change of index look up [hudi]

Reply via email to