sandy du created HUDI-5669:
------------------------------

             Summary: BucketIndexPartitioner maybe cause task  data skew
                 Key: HUDI-5669
                 URL: https://issues.apache.org/jira/browse/HUDI-5669
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: sandy du
             Fix For: 0.14.0


BucketIndexPartitioner  partition method  curBucket call  twice hashCode  maybe 
cause  data skew
{code:java}
public int partition(HoodieKey key, int numPartitions) {
  int curBucket = BucketIdentifier.getBucketId(key, indexKeyFields, bucketNum);
  int globalHash = (key.getPartitionPath() + curBucket).hashCode() & 
Integer.MAX_VALUE;
  return BucketIdentifier.mod(globalHash, numPartitions);
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to