yuemeng created HUDI-4506:
-----------------------------
Summary: make BucketIndexPartitioner distribute data more balanced
Key: HUDI-4506
URL: https://issues.apache.org/jira/browse/HUDI-4506
Project: Apache Hudi
Issue Type: Improvement
Components: flink
Reporter: yuemeng
Assignee: yuemeng
Currently.BucketIndexPartitioner uses a partition path and bucket id to
partition data to write subTask
Suppose:
# bucket num < write tasks num
# old partition was never modified or only a little old partition path had
modified.
this partition logical
{code}
int globalHash = (key.getPartitionPath() + curBucket +
hashKeys.hashCode()).hashCode() & Integer.MAX_VALUE;
{code}
will cause data may distribute some of the write tasks
--
This message was sent by Atlassian Jira
(v8.20.10#820010)