[GitHub] [hudi] nsivabalan commented on a diff in pull request #8914: [HUDI-6344] Flink MDT bulk_insert for initial commit

via GitHub Fri, 09 Jun 2023 13:38:16 -0700


nsivabalan commented on code in PR #8914:
URL: https://github.com/apache/hudi/pull/8914#discussion_r1224748569



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java:
##########
@@ -149,9 +161,9 @@ protected void commit(String instantTime, 
Map<MetadataPartitionType, HoodieData<
         writeClient.getHeartbeatClient().start(instantTime);
       }
 
-      List<WriteStatus> statuses = preppedRecordList.size() > 0
-          ? writeClient.upsertPreppedRecords(preppedRecordList, instantTime)
-          : Collections.emptyList();
+      List<WriteStatus> statuses = isInitializing
+          ? writeClient.bulkInsertPreppedRecords(preppedRecordList, 
instantTime, Option.empty())

Review Comment:
   major reason to use bulkInsert is that, we use a custom partitioner based on 
file group and so the spark tasks will be such that, each spark task will get 
records pertaining to one file group of interest. 
   
   we can try to incorporate that as well. esply with RLI, record mapping to 
file groups is based on hash. So, we can't have diff set of records routed to 
one spark task. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8914: [HUDI-6344] Flink MDT bulk_insert for initial commit

Reply via email to