Re: [PR] HIVE-28798: Bucket Map Join partially using partition transforms [hive]

via GitHub Mon, 03 Mar 2025 19:15:18 -0800


okumin commented on code in PR #5670:
URL: https://github.com/apache/hive/pull/5670#discussion_r1978538214



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java:
##########
@@ -381,45 +381,84 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
   public static class SelectRule implements SemanticNodeProcessor {
 
     // For bucket columns
-    // If all the columns match to the parent, put them in the bucket cols
+    // If the projected columns are compatible with the bucketing requirement, 
put them in the bucket cols
     // else, add empty list.
+    public void putConvertedColNamesForBucket(
+        List<List<String>> parentColNamesList, List<CustomBucketFunction> 
parentBucketFunctions, SelectOperator selOp,
+        List<List<String>> newBucketColNamesList, List<CustomBucketFunction> 
newBucketFunctions) {
+      Preconditions.checkState(parentColNamesList.size() == 
parentBucketFunctions.size());
+      for (int i = 0; i < parentColNamesList.size(); i++) {
+        List<String> colNames = parentColNamesList.get(i);
+
+        List<String> newBucketColNames = new ArrayList<>();
+        boolean[] retainedColumns = new boolean[colNames.size()];
+        boolean allFound = true;
+        for (int j = 0; j < colNames.size(); j++) {
+          final String colName = colNames.get(j);
+          Optional<String> newColName = resolveNewColName(colName, selOp);
+          if (newColName.isPresent()) {
+            retainedColumns[j] = true;
+            newBucketColNames.add(newColName.get());
+          } else {
+            retainedColumns[j] = false;
+            allFound = false;
+          }
+        }
+
+        CustomBucketFunction bucketFunction = parentBucketFunctions.get(i);
+        if (allFound) {
+          newBucketColNamesList.add(newBucketColNames);
+          newBucketFunctions.add(bucketFunction);
+          break;
+        }
+        if (bucketFunction == null) {
+          // Hive's native bucketing is effective only when all the bucketing 
columns are used
+          newBucketColNamesList.add(new ArrayList<>());
+          newBucketFunctions.add(null);
+          break;
+        }
+        Optional<CustomBucketFunction> newBucketFunction = 
bucketFunction.select(retainedColumns);

Review Comment:
   I finally want to implement Hive's v1 and v2 bucketing as 
CustomBucketFunctions to unify the logic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28798: Bucket Map Join partially using partition transforms [hive]

Reply via email to