nsivabalan commented on code in PR #13295:
URL: https://github.com/apache/hudi/pull/13295#discussion_r2112563984


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1481,18 +1487,21 @@ protected HoodieData<HoodieRecord> 
prepRecords(Map<String, HoodieData<HoodieReco
         ValidationUtils.checkArgument(fileGroupCount > 0, 
String.format("FileGroup count for MDT partition %s should be > 0", 
partitionName));
 
         List<FileSlice> finalFileSlices = fileSlices;
+        Set<String> mappedFileIds = new HashSet<>();
         HoodieData<HoodieRecord> rddSinglePartitionRecords = records.map(r -> {
           FileSlice slice = 
finalFileSlices.get(HoodieTableMetadataUtil.mapRecordKeyToFileGroupIndex(r.getRecordKey(),
               fileGroupCount));
           r.unseal();
           r.setCurrentLocation(new 
HoodieRecordLocation(slice.getBaseInstantTime(), slice.getFileId()));
           r.seal();
+          mappedFileIds.add(slice.getFileId());

Review Comment:
   good catch. I did fix it in my later patch, but missed to update here. 
   
   https://github.com/apache/hudi/pull/13312/files#r2112561019 
   you can check it out here. 
   
   there is a known limitation to this. we will be spinning up a spark task 
while writing for all file groups for the mdt partitions we are touching, since 
w/o triggering the action on the records, we don't really know which among the 
file groups actually have records to be written to. 
   
   but since we do not want to trigger any action, we are returning every file 
group in the partiions we touch from here. 
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1481,18 +1487,21 @@ protected HoodieData<HoodieRecord> 
prepRecords(Map<String, HoodieData<HoodieReco
         ValidationUtils.checkArgument(fileGroupCount > 0, 
String.format("FileGroup count for MDT partition %s should be > 0", 
partitionName));
 
         List<FileSlice> finalFileSlices = fileSlices;
+        Set<String> mappedFileIds = new HashSet<>();
         HoodieData<HoodieRecord> rddSinglePartitionRecords = records.map(r -> {
           FileSlice slice = 
finalFileSlices.get(HoodieTableMetadataUtil.mapRecordKeyToFileGroupIndex(r.getRecordKey(),
               fileGroupCount));
           r.unseal();
           r.setCurrentLocation(new 
HoodieRecordLocation(slice.getBaseInstantTime(), slice.getFileId()));
           r.seal();
+          mappedFileIds.add(slice.getFileId());

Review Comment:
   will fix this patch accordingly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to