Re: [PR] [HUDI-8990] Partition bucket index supports query pruning based on bucket id [hudi]

via GitHub Wed, 02 Apr 2025 08:30:32 -0700


zhangyue19921010 commented on code in PR #13060:
URL: https://github.com/apache/hudi/pull/13060#discussion_r2025037864



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/prune/PrimaryKeyPruners.java:
##########
@@ -45,7 +45,7 @@ public class PrimaryKeyPruners {
 
   public static final int BUCKET_ID_NO_PRUNING = -1;
 
-  public static int getBucketId(List<ResolvedExpression> hashKeyFilters, 
Configuration conf) {
+  public static int getBucketFieldHashing(List<ResolvedExpression> 
hashKeyFilters, Configuration conf) {

Review Comment:
   Sorry Danny, I didn't get this. Is that possible to get full partition path 
during original dataBucket computation?
   ```
     @Override
     public Result applyFilters(List<ResolvedExpression> filters) {
       List<ResolvedExpression> simpleFilters = 
filterSimpleCallExpression(filters);
       Tuple2<List<ResolvedExpression>, List<ResolvedExpression>> splitFilters 
= splitExprByPartitionCall(simpleFilters, this.partitionKeys, 
this.tableRowType);
       this.predicates = ExpressionPredicates.fromExpression(splitFilters.f0);
       this.columnStatsProbe = ColumnStatsProbe.newInstance(splitFilters.f0);
       this.partitionPruner = createPartitionPruner(splitFilters.f1, 
columnStatsProbe);
       this.dataBucket = getDataBucket(splitFilters.f0);
       // refuse all the filters now
       return SupportsFilterPushDown.Result.of(new 
ArrayList<>(splitFilters.f1), new ArrayList<>(filters));
     }
   ```
   
   What is PR did is get and pass hashing value to `getFilesInPartitions`, then 
compute numBuckets , finally compute the final bucket id `hashing value % 
numBuckets`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8990] Partition bucket index supports query pruning based on bucket id [hudi]

Reply via email to