sv2000 commented on a change in pull request #2966: URL: https://github.com/apache/incubator-gobblin/pull/2966#discussion_r420897902
########## File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcKeyDedupReducer.java ########## @@ -37,6 +53,30 @@ protected void setOutKey(OrcValue valueToRetain) { // do nothing since initReusableObject has assigned value for outKey. } + @Override + protected void reduce(OrcKey key, Iterable<OrcValue> values, Context context) + throws IOException, InterruptedException { + + /* Map from hash of value(Typed in OrcStruct) object to its times of duplication*/ + Map<Integer, Integer> valuesToRetain = new HashMap<>(); + int valueHash = 0; + + for (OrcValue value : values) { Review comment: Thanks for the explanation. Makes sense. ########## File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/RecordKeyDedupReducerBase.java ########## @@ -88,21 +87,33 @@ protected void reduce(KI key, Iterable<VI> values, Context context) numVals++; } + writeRetainValue(valueToRetain, context); Review comment: Ah ok. Maybe name it writeRetainedValue then? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org