nsivabalan commented on code in PR #8914:
URL: https://github.com/apache/hudi/pull/8914#discussion_r1225422681
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java:
##########
@@ -149,9 +161,9 @@ protected void commit(String instantTime,
Map<MetadataPartitionType, HoodieData<
writeClient.getHeartbeatClient().start(instantTime);
}
- List<WriteStatus> statuses = preppedRecordList.size() > 0
- ? writeClient.upsertPreppedRecords(preppedRecordList, instantTime)
- : Collections.emptyList();
+ List<WriteStatus> statuses = isInitializing
+ ? writeClient.bulkInsertPreppedRecords(preppedRecordList,
instantTime, Option.empty())
Review Comment:
records to file group mapping is deterministic and we can have only one file
written per file group. for eg, if we instanttiate col stats with 4 file
groups, we should spin up 4 spark tasks and each spark task should get records
pertaining to the file group of interest (remember records are mapped to file
group based on hashing). So, if one spark task gets records for all file
groups, then we might end up w/ n*m files (where n is no of spark tasks and m
is number of file groups) which may not work. we need only m files created and
m spark tasks should spin up where each spark tasks writes to just 1 file
group.
hope that makes sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]