[GitHub] [hive] zabetak commented on a diff in pull request #4442: HIVE-27399: Add support for lateral views with CBO

via GitHub Mon, 07 Aug 2023 09:08:27 -0700


zabetak commented on code in PR #4442:
URL: https://github.com/apache/hive/pull/4442#discussion_r1285620207



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterTableFunctionTransposeRule.java:
##########
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.metadata.RelColumnMapping;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.commons.collections4.CollectionUtils;
+import org.apache.hadoop.hive.ql.exec.FunctionRegistry;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Rule to transpose Filter and TableFunctionScan RelNodes
+ */
+public class HiveFilterTableFunctionTransposeRule extends RelOptRule {
+
+  public static final HiveFilterTableFunctionTransposeRule INSTANCE =
+          new 
HiveFilterTableFunctionTransposeRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveFilterTableFunctionTransposeRule(RelBuilderFactory 
relBuilderFactory) {
+    super(operand(HiveFilter.class, operand(HiveTableFunctionScan.class, 
any())),
+        relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final Filter filterRel = call.rel(0);
+    final HiveTableFunctionScan tfs = call.rel(1);
+
+    RexNode condition = filterRel.getCondition();
+    if (!HiveCalciteUtil.isDeterministic(condition)) {
+      return false;
+    }
+
+    // If the HiveTableFunctionScan is a special inline(array(...))
+    // udtf, the table is generated from this node. The underlying
+    // RelNode will be a dummy table so no filter condition should
+    // pass through the HiveTableFunctionScan.
+    if (isInlineArray(tfs)) {
+      return false;
+    }

Review Comment:
   Why do we need to check explicitly for inlineArray and rely upon 
FunctionRegistry? Isn't `RelColumnMapping` sufficient to give us this 
information?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -577,6 +588,77 @@ private QueryBlockInfo convertSource(RelNode r) throws 
CalciteSemanticException
         ast = ASTBuilder.subQuery(left, sqAlias);
         s = new Schema((Union) r, sqAlias);
       }
+    } else if (r instanceof HiveTableFunctionScan &&
+        !canOptimizeOutLateralView((HiveTableFunctionScan) r)) {
+      // In the case where the RelNode is a HiveTableFunctionScan, first we 
check
+      // to see if we can't optimize out the lateral view operator. We can 
optimize the
+      // operator out if only the udtf fields are grabbed out of the RelNode.  
If any
+      // of the base table fields need to be grabbed out, then a 'join' needs 
to be done
+      // and we need the lateral view.
+      //
+      // The structure of the AST in this "else if" branch will be:

Review Comment:
   Would it make sense to refactor this new code in a separate method? Maybe 
also add a test similar to `TestASTConverter` to enforce the expected output.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -621,6 +703,45 @@ private ASTNode pkFkHint(int fkTableIndex, boolean 
nonFkSideIsFiltered) {
     }
   }
 
+  /**
+   * Check to see if we can optimize out the lateral view operators
+   * We do not need to use the lateral view syntax if all of the fields
+   * selected out of the table scan come from the UDTF call.  No join
+   * is needed because all the fields come from the table level rather
+   * than the row level.
+   */
+  private boolean canOptimizeOutLateralView(HiveTableFunctionScan htfs) {
+    Set<Integer> inputRefs = new HashSet<>();
+    if (this.select instanceof HiveProject) {

Review Comment:
   Is `this` operator always the parent of `htfs` or it can be multiple levels 
up (grand-grand-...-parent)?
   
   If it is just the parent then I get again the feeling that this should be a 
separate CBO rule. If we are talking about operators at multiple levels up then 
I am not sure to what inputRefs refer to.
   
   The code here looks like something that the `HiveRelFieldTrimmer` should do 
before dropping unnecessary references.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java:
##########
@@ -382,6 +387,11 @@ private static boolean validExchangeChild(HiveSortExchange 
sortNode) {
     return sortNode.getInput() instanceof Project;
   }
 
+  private static boolean validTableFunctionScanChild(HiveTableFunctionScan 
htfsNode) {
+    return htfsNode.getInputs().size() == 1 &&
+        (htfsNode.getInput(0) instanceof Project || htfsNode.getInput(0) 
instanceof HiveTableScan);

Review Comment:
   I am not sure how this will work if we are transposing Filters (potentially 
other ops) below a `HiveTableFunctionScan`?
   
   Is htfsNode.getInputs().size() > 1 possible? Should we put guards against 
this case somewhere in the code?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -577,6 +588,77 @@ private QueryBlockInfo convertSource(RelNode r) throws 
CalciteSemanticException
         ast = ASTBuilder.subQuery(left, sqAlias);
         s = new Schema((Union) r, sqAlias);
       }
+    } else if (r instanceof HiveTableFunctionScan &&
+        !canOptimizeOutLateralView((HiveTableFunctionScan) r)) {

Review Comment:
   I didn't follow all the details about this `canOptimizeOutLateralView` 
method and related code but the comment gives me the impression that this is a 
CBO level optimization. Would it be possible to model this with a CBO rule?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -256,10 +257,20 @@ private RelDataType getRetType(RelOptCluster cluster, 
RelNode inputRel,
     Preconditions.checkState(retType.isStruct());
 
     // Add the type names and values from the udtf into the lists that will 
make up the
-    // return type.
+    // return type. Names need to be unique so add the table prefix

Review Comment:
   Shouldn't we do this disambiguation inside `getColumnAliasesFromASTNode` so 
that other places where we use `columnAliases` have the same content?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -124,9 +124,12 @@ public void onMatch(RelOptRuleCall call) {
     RexNode newInlineCall =
         cluster.getRexBuilder().makeCall(tfs.getRowType(), inlineCall.op, 
newArrayCall);
 
+    // Use null for columnMappings. The return row type of the RelNode now 
comprises of

Review Comment:
   According to the javadoc of TableFunctionScan#getColumnMappings() a `null` 
indicates unknown and is different from empty. What semantics do we want here?
   
   



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterTableFunctionTransposeRule.java:
##########
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.metadata.RelColumnMapping;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.commons.collections4.CollectionUtils;
+import org.apache.hadoop.hive.ql.exec.FunctionRegistry;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Rule to transpose Filter and TableFunctionScan RelNodes

Review Comment:
   Please highlight the main differences with 
`org.apache.calcite.rel.rules.FilterTableFunctionTransposeRule` to clarify why 
a separate class is necessary.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -1024,30 +1022,8 @@ boolean isCBOExecuted() {
 
   @Override
   boolean isCBOSupportedLateralView(ASTNode lateralView) {
-    // Lateral view AST has the following shape:
-    // ^(TOK_LATERAL_VIEW
-    //   ^(TOK_SELECT ^(TOK_SELEXPR ^(TOK_FUNCTION Identifier params) 
identifier* tableAlias)))
-    if (lateralView.getToken().getType() == HiveParser.TOK_LATERAL_VIEW_OUTER) 
{
-      // LATERAL VIEW OUTER not supported in CBO
-      return false;
-    }
-    // Only INLINE followed by ARRAY supported in CBO
-    ASTNode lvFunc = (ASTNode) lateralView.getChild(0).getChild(0).getChild(0);
-    String lvFuncName = lvFunc.getChild(0).getText();
-    if (lvFuncName.compareToIgnoreCase(
-            GenericUDTFInline.class.getAnnotation(Description.class).name()) 
!= 0) {
-      return false;
-    }
-    if (lvFunc.getChildCount() != 2) {
-      return false;
-    }

Review Comment:
   Do we support arbitrary number of childs for lateral views? Is this a new 
feature?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -114,7 +115,7 @@ public LateralViewPlan(ASTNode lateralView, RelOptCluster 
cluster, RelNode input
 
     this.lateralViewRel = HiveTableFunctionScan.create(cluster,
         TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), 
udtfCall,
-        null, retType, null);
+        null, retType, createColumnMappings(inputRel));

Review Comment:
   It is a bit strange that the column mapping depends only on the `inputRel` 
and the `udtfCall` does not have any impact on the result. Why is that?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -124,9 +124,12 @@ public void onMatch(RelOptRuleCall call) {
     RexNode newInlineCall =
         cluster.getRexBuilder().makeCall(tfs.getRowType(), inlineCall.op, 
newArrayCall);
 
+    // Use null for columnMappings. The return row type of the RelNode now 
comprises of
+    // all the fields within the UDTF, so there is no mapping from the output 
fields
+    // directly to the input fields anymore.
     final RelNode newTableFunctionScanNode = tfs.copy(tfs.getTraitSet(),
         tfs.getInputs(), newInlineCall, tfs.getElementType(), tfs.getRowType(),
-        tfs.getColumnMappings());
+        /*columnMappings*/ null);

Review Comment:
   nit: Putting the name of the variable inside comments is not necessary. IDEs 
are pretty intelligent these days.
   
   
![columnMappings](https://github.com/apache/hive/assets/15013153/448d0e88-8a66-4ad2-a9f4-ce7385417f0e)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

[GitHub] [hive] zabetak commented on a diff in pull request #4442: HIVE-27399: Add support for lateral views with CBO

Reply via email to