nsivabalan commented on code in PR #8914:
URL: https://github.com/apache/hudi/pull/8914#discussion_r1224748569
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java:
##########
@@ -149,9 +161,9 @@ protected void commit(String instantTime,
Map<MetadataPartitionType, HoodieData<
writeClient.getHeartbeatClient().start(instantTime);
}
- List<WriteStatus> statuses = preppedRecordList.size() > 0
- ? writeClient.upsertPreppedRecords(preppedRecordList, instantTime)
- : Collections.emptyList();
+ List<WriteStatus> statuses = isInitializing
+ ? writeClient.bulkInsertPreppedRecords(preppedRecordList,
instantTime, Option.empty())
Review Comment:
major reason to use bulkInsert is that, we use a custom partitioner based on
file group and so the spark tasks will be such that, each spark task will get
records pertaining to one file group of interest.
we can try to incorporate that as well. esply with RLI, record mapping to
file groups is based on hash. So, we can't have diff set of records routed to
one spark task.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]