nsivabalan commented on issue #1021: how can i deal this problem when 
partition's value changed with the same row_key? 
URL: https://github.com/apache/incubator-hudi/issues/1021#issuecomment-559224596
 
 
   @vinothchandar : I found the root cause.  
   
   within HoodieGlobalIndex#explodeRecordRDDWithFileComparisons
   
   ```
   JavaRDD<Tuple2<String, HoodieKey>> explodeRecordRDDWithFileComparisons(
         final Map<String, List<BloomIndexFileInfo>> partitionToFileIndexInfo,
         JavaPairRDD<String, String> partitionRecordKeyPairRDD) {
   .
   .
   .
    return  partitionRecordKeyPairRDD.map(partitionRecordKeyPair -> {
            String recordKey = partitionRecordKeyPair._2();
            String partitionPath = partitionRecordKeyPair._1();
   
            return indexFileFilter.getMatchingFiles(partitionPath, 
recordKey).stream()
             .map(file -> new Tuple2<>(file, new HoodieKey(recordKey, 
indexToPartitionMap.get(file))))
             .collect(Collectors.toList());
       }).flatMap(List::iterator);
   ```
   
   
   
   In this, indexFileFilter.getMatchingFiles(partitionPath, recordKey) returns 
fileId from Partition1, where as incoming record is tagged with Partition2. 
   
   So, this is what I am thinking as the fix. as of now, 
IndexFileFilter.getMatchingFiles(String partitionPath, String recordKey) is 
returning Set<fileId>s. Instead, IndexFileFilter.getMatchingFiles(String 
partitionPath, String recordKey) should return Set<Pair<PartitionPath, fileId>> 
and we should attach that as below.
   ```
   return partitionRecordKeyPairRDD.map(partitionRecordKeyPair -> {
         String recordKey = partitionRecordKeyPair._2();
         String partitionPath = partitionRecordKeyPair._1();
   
         return indexFileFilter.getMatchingFiles(partitionPath, 
recordKey).stream()
             .map( (origPartitionPath, matchingFile) -> new 
Tuple2<>(matchingFile, new HoodieKey(recordKey, origPartitionPath)))
             .collect(Collectors.toList());
       }).flatMap(List::iterator);
   ```
   
   But how do we intimate the user that these records are updated with 
Partition1 and not Partition2 as per incoming records in upsert call? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to