[GitHub] [hudi] beyond1920 commented on a diff in pull request #8072: [HUDI-5857] Index overwrite into bucket table would generate new file group id

via GitHub Tue, 28 Feb 2023 01:56:08 -0800


beyond1920 commented on code in PR #8072:
URL: https://github.com/apache/hudi/pull/8072#discussion_r1119817501



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkInsertOverwriteCommitActionExecutor.java:
##########
@@ -66,7 +67,9 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>> execute() 
{
   @Override
   protected Partitioner getPartitioner(WorkloadProfile profile) {
     return table.getStorageLayout().layoutPartitionerClass()
-        .map(c -> getLayoutPartitioner(profile, c))
+        .map(c -> 
c.equals(HoodieLayoutConfig.SIMPLE_BUCKET_LAYOUT_PARTITIONER_CLASS_NAME)

Review Comment:
   @KnightChess Thanks for your advice.
   Remove tagLocation could also fixed this problem. However I prefer to fix 
this problem by generate new file ids because:
   1. Remove tag location would change stats, for example, miss updated count
   2. It's better to keep same behavior for all index types instead of only 
remove tag location in insert overwrite for bucket index table.
   But remove tag location is a good improvement to speed up insert overwrite. 
I would created a new JIRA to track this issue. Maybe using bulk insert to do 
insert overwrite for all index typed. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] beyond1920 commented on a diff in pull request #8072: [HUDI-5857] Index overwrite into bucket table would generate new file group id

Reply via email to