vinothchandar commented on code in PR #13414:
URL: https://github.com/apache/hudi/pull/13414#discussion_r2167884338
##########
hudi-common/src/main/java/org/apache/hudi/common/data/HoodiePairData.java:
##########
@@ -153,6 +154,23 @@ <L, W> HoodiePairData<L, W> mapToPair(
*/
List<Pair<K, V>> collectAsList();
+ /**
+ * Collects results of the underlying collection into a {@link Map<Pair<K,
V>>}
+ * If there are multiple pairs sharing the same key, the resulting map
randomly picks one among them.
Review Comment:
we want to keep the `Data` abstractions simpler, close to what spark RDD
will provide.. SO if you have custom processing logic to munge the data like
the de-duping, lets keep it outside of the Data abstraction. Else it will
become harder to maintain.
I am suggesting we do this logic outside in the caller sites
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]