yuemeng created HUDI-4506:
-----------------------------

             Summary: make BucketIndexPartitioner distribute data more balanced
                 Key: HUDI-4506
                 URL: https://issues.apache.org/jira/browse/HUDI-4506
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink
            Reporter: yuemeng
            Assignee: yuemeng


Currently.BucketIndexPartitioner uses a partition path and bucket id to 
partition data to write subTask

Suppose:
 # bucket num < write tasks num
 # old partition was never modified or only a little old partition path had 
modified. 

this partition logical

{code}

int globalHash = (key.getPartitionPath() + curBucket + 
hashKeys.hashCode()).hashCode() & Integer.MAX_VALUE;

{code}

will cause data may distribute some of the write tasks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to