nsivabalan commented on code in PR #8914:
URL: https://github.com/apache/hudi/pull/8914#discussion_r1224748569


##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java:
##########
@@ -149,9 +161,9 @@ protected void commit(String instantTime, 
Map<MetadataPartitionType, HoodieData<
         writeClient.getHeartbeatClient().start(instantTime);
       }
 
-      List<WriteStatus> statuses = preppedRecordList.size() > 0
-          ? writeClient.upsertPreppedRecords(preppedRecordList, instantTime)
-          : Collections.emptyList();
+      List<WriteStatus> statuses = isInitializing
+          ? writeClient.bulkInsertPreppedRecords(preppedRecordList, 
instantTime, Option.empty())

Review Comment:
   major reason to use bulkInsert is that, we use a custom partitioner based on 
file group and so the spark tasks will be such that, each spark task will get 
records pertaining to one file group of interest. 
   
   we can try to incorporate that as well. esply with RLI, record mapping to 
file groups is based on hash. So, we can't have diff set of records routed to 
one spark task. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to