beyond1920 commented on code in PR #8072:
URL: https://github.com/apache/hudi/pull/8072#discussion_r1119817501
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkInsertOverwriteCommitActionExecutor.java:
##########
@@ -66,7 +67,9 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>> execute()
{
@Override
protected Partitioner getPartitioner(WorkloadProfile profile) {
return table.getStorageLayout().layoutPartitionerClass()
- .map(c -> getLayoutPartitioner(profile, c))
+ .map(c ->
c.equals(HoodieLayoutConfig.SIMPLE_BUCKET_LAYOUT_PARTITIONER_CLASS_NAME)
Review Comment:
@KnightChess Thanks for your advice.
Remove tagLocation could also fixed this problem. However I prefer to fix
this problem by generate new file ids because:
1. Remove tag location would change stats, for example, miss updated count
2. It's better to keep same behavior for all index types instead of only
remove tag location in insert overwrite for bucket index table.
But remove tag location is a good improvement to speed up insert overwrite.
I would created a new JIRA to track this issue. Maybe using bulk insert to do
insert overwrite for all index typed. WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]