[GitHub] [hudi] shenh062326 commented on a change in pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

GitBox Sat, 01 Aug 2020 17:35:07 -0700


shenh062326 commented on a change in pull request #1868:
URL: https://github.com/apache/hudi/pull/1868#discussion_r464014566




##########
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -270,20 +273,24 @@ public int getPartition(Object key) {
       return updateLocationToBucket.get(location.getFileId());
     } else {
       String partitionPath = keyLocation._1().getPartitionPath();
-      List<InsertBucket> targetBuckets = 
partitionPathToInsertBuckets.get(partitionPath);
+      List<InsertBucketCumulativeWeightPair> targetBuckets = 
partitionPathToInsertBucketInfos.get(partitionPath);
       // pick the target bucket to use based on the weights.
-      double totalWeight = 0.0;
       final long totalInserts = Math.max(1, 
profile.getWorkloadStat(partitionPath).getNumInserts());
       final long hashOfKey = NumericUtils.getMessageDigestHash("MD5", 
keyLocation._1().getRecordKey());
       final double r = 1.0 * Math.floorMod(hashOfKey, totalInserts) / 
totalInserts;
-      for (InsertBucket insertBucket : targetBuckets) {
-        totalWeight += insertBucket.weight;
-        if (r <= totalWeight) {
-          return insertBucket.bucketNumber;
-        }
+
+      int index = Collections.binarySearch(targetBuckets, new 
InsertBucketCumulativeWeightPair(new InsertBucket(), r));
+
+      if (index >= 0) {

Review comment:
       The last buckets cumulative weight should be 1, and the search entry 
should not greater than the last entry.
   Even if the search entry greater than all entries, it will return the last 
bucketNumber.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] shenh062326 commented on a change in pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

Reply via email to