sandy du created HUDI-5669:
------------------------------
Summary: BucketIndexPartitioner maybe cause task data skew
Key: HUDI-5669
URL: https://issues.apache.org/jira/browse/HUDI-5669
Project: Apache Hudi
Issue Type: Bug
Reporter: sandy du
Fix For: 0.14.0
BucketIndexPartitioner partition method curBucket call twice hashCode maybe
cause data skew
{code:java}
public int partition(HoodieKey key, int numPartitions) {
int curBucket = BucketIdentifier.getBucketId(key, indexKeyFields, bucketNum);
int globalHash = (key.getPartitionPath() + curBucket).hashCode() &
Integer.MAX_VALUE;
return BucketIdentifier.mod(globalHash, numPartitions);
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)