nsivabalan commented on issue #1021: how can i deal this problem when partition's value changed with the same row_key? URL: https://github.com/apache/incubator-hudi/issues/1021#issuecomment-559224596 @vinothchandar : I found the root cause. within HoodieGlobalIndex#explodeRecordRDDWithFileComparisons ``` JavaRDD<Tuple2<String, HoodieKey>> explodeRecordRDDWithFileComparisons( final Map<String, List<BloomIndexFileInfo>> partitionToFileIndexInfo, JavaPairRDD<String, String> partitionRecordKeyPairRDD) { . . . return partitionRecordKeyPairRDD.map(partitionRecordKeyPair -> { String recordKey = partitionRecordKeyPair._2(); String partitionPath = partitionRecordKeyPair._1(); return indexFileFilter.getMatchingFiles(partitionPath, recordKey).stream() .map(file -> new Tuple2<>(file, new HoodieKey(recordKey, indexToPartitionMap.get(file)))) .collect(Collectors.toList()); }).flatMap(List::iterator); ``` In this, indexFileFilter.getMatchingFiles(partitionPath, recordKey) returns fileId from Partition1, where as incoming record is tagged with Partition2. So, this is what I am thinking as the fix. as of now, IndexFileFilter.getMatchingFiles(String partitionPath, String recordKey) is returning Set<fileId>s. Instead, IndexFileFilter.getMatchingFiles(String partitionPath, String recordKey) should return Set<Pair<PartitionPath, fileId>> and we should attach that as below. ``` return partitionRecordKeyPairRDD.map(partitionRecordKeyPair -> { String recordKey = partitionRecordKeyPair._2(); String partitionPath = partitionRecordKeyPair._1(); return indexFileFilter.getMatchingFiles(partitionPath, recordKey).stream() .map( (origPartitionPath, matchingFile) -> new Tuple2<>(matchingFile, new HoodieKey(recordKey, origPartitionPath))) .collect(Collectors.toList()); }).flatMap(List::iterator); ``` But how do we intimate the user that these records are updated with Partition1 and not Partition2 as per incoming records in upsert call?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
