zabetak commented on code in PR #4442: URL: https://github.com/apache/hive/pull/4442#discussion_r1285620207
########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterTableFunctionTransposeRule.java: ########## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.metadata.RelColumnMapping; +import org.apache.calcite.rex.RexCall; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.commons.collections4.CollectionUtils; +import org.apache.hadoop.hive.ql.exec.FunctionRegistry; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +/** + * Rule to transpose Filter and TableFunctionScan RelNodes + */ +public class HiveFilterTableFunctionTransposeRule extends RelOptRule { + + public static final HiveFilterTableFunctionTransposeRule INSTANCE = + new HiveFilterTableFunctionTransposeRule(HiveRelFactories.HIVE_BUILDER); + + public HiveFilterTableFunctionTransposeRule(RelBuilderFactory relBuilderFactory) { + super(operand(HiveFilter.class, operand(HiveTableFunctionScan.class, any())), + relBuilderFactory, null); + } + + @Override + public boolean matches(RelOptRuleCall call) { + final Filter filterRel = call.rel(0); + final HiveTableFunctionScan tfs = call.rel(1); + + RexNode condition = filterRel.getCondition(); + if (!HiveCalciteUtil.isDeterministic(condition)) { + return false; + } + + // If the HiveTableFunctionScan is a special inline(array(...)) + // udtf, the table is generated from this node. The underlying + // RelNode will be a dummy table so no filter condition should + // pass through the HiveTableFunctionScan. + if (isInlineArray(tfs)) { + return false; + } Review Comment: Why do we need to check explicitly for inlineArray and rely upon FunctionRegistry? Isn't `RelColumnMapping` sufficient to give us this information? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -577,6 +588,77 @@ private QueryBlockInfo convertSource(RelNode r) throws CalciteSemanticException ast = ASTBuilder.subQuery(left, sqAlias); s = new Schema((Union) r, sqAlias); } + } else if (r instanceof HiveTableFunctionScan && + !canOptimizeOutLateralView((HiveTableFunctionScan) r)) { + // In the case where the RelNode is a HiveTableFunctionScan, first we check + // to see if we can't optimize out the lateral view operator. We can optimize the + // operator out if only the udtf fields are grabbed out of the RelNode. If any + // of the base table fields need to be grabbed out, then a 'join' needs to be done + // and we need the lateral view. + // + // The structure of the AST in this "else if" branch will be: Review Comment: Would it make sense to refactor this new code in a separate method? Maybe also add a test similar to `TestASTConverter` to enforce the expected output. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -621,6 +703,45 @@ private ASTNode pkFkHint(int fkTableIndex, boolean nonFkSideIsFiltered) { } } + /** + * Check to see if we can optimize out the lateral view operators + * We do not need to use the lateral view syntax if all of the fields + * selected out of the table scan come from the UDTF call. No join + * is needed because all the fields come from the table level rather + * than the row level. + */ + private boolean canOptimizeOutLateralView(HiveTableFunctionScan htfs) { + Set<Integer> inputRefs = new HashSet<>(); + if (this.select instanceof HiveProject) { Review Comment: Is `this` operator always the parent of `htfs` or it can be multiple levels up (grand-grand-...-parent)? If it is just the parent then I get again the feeling that this should be a separate CBO rule. If we are talking about operators at multiple levels up then I am not sure to what inputRefs refer to. The code here looks like something that the `HiveRelFieldTrimmer` should do before dropping unnecessary references. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java: ########## @@ -382,6 +387,11 @@ private static boolean validExchangeChild(HiveSortExchange sortNode) { return sortNode.getInput() instanceof Project; } + private static boolean validTableFunctionScanChild(HiveTableFunctionScan htfsNode) { + return htfsNode.getInputs().size() == 1 && + (htfsNode.getInput(0) instanceof Project || htfsNode.getInput(0) instanceof HiveTableScan); Review Comment: I am not sure how this will work if we are transposing Filters (potentially other ops) below a `HiveTableFunctionScan`? Is htfsNode.getInputs().size() > 1 possible? Should we put guards against this case somewhere in the code? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -577,6 +588,77 @@ private QueryBlockInfo convertSource(RelNode r) throws CalciteSemanticException ast = ASTBuilder.subQuery(left, sqAlias); s = new Schema((Union) r, sqAlias); } + } else if (r instanceof HiveTableFunctionScan && + !canOptimizeOutLateralView((HiveTableFunctionScan) r)) { Review Comment: I didn't follow all the details about this `canOptimizeOutLateralView` method and related code but the comment gives me the impression that this is a CBO level optimization. Would it be possible to model this with a CBO rule? ########## ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java: ########## @@ -256,10 +257,20 @@ private RelDataType getRetType(RelOptCluster cluster, RelNode inputRel, Preconditions.checkState(retType.isStruct()); // Add the type names and values from the udtf into the lists that will make up the - // return type. + // return type. Names need to be unique so add the table prefix Review Comment: Shouldn't we do this disambiguation inside `getColumnAliasesFromASTNode` so that other places where we use `columnAliases` have the same content? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java: ########## @@ -124,9 +124,12 @@ public void onMatch(RelOptRuleCall call) { RexNode newInlineCall = cluster.getRexBuilder().makeCall(tfs.getRowType(), inlineCall.op, newArrayCall); + // Use null for columnMappings. The return row type of the RelNode now comprises of Review Comment: According to the javadoc of TableFunctionScan#getColumnMappings() a `null` indicates unknown and is different from empty. What semantics do we want here? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterTableFunctionTransposeRule.java: ########## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.metadata.RelColumnMapping; +import org.apache.calcite.rex.RexCall; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.commons.collections4.CollectionUtils; +import org.apache.hadoop.hive.ql.exec.FunctionRegistry; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +/** + * Rule to transpose Filter and TableFunctionScan RelNodes Review Comment: Please highlight the main differences with `org.apache.calcite.rel.rules.FilterTableFunctionTransposeRule` to clarify why a separate class is necessary. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: ########## @@ -1024,30 +1022,8 @@ boolean isCBOExecuted() { @Override boolean isCBOSupportedLateralView(ASTNode lateralView) { - // Lateral view AST has the following shape: - // ^(TOK_LATERAL_VIEW - // ^(TOK_SELECT ^(TOK_SELEXPR ^(TOK_FUNCTION Identifier params) identifier* tableAlias))) - if (lateralView.getToken().getType() == HiveParser.TOK_LATERAL_VIEW_OUTER) { - // LATERAL VIEW OUTER not supported in CBO - return false; - } - // Only INLINE followed by ARRAY supported in CBO - ASTNode lvFunc = (ASTNode) lateralView.getChild(0).getChild(0).getChild(0); - String lvFuncName = lvFunc.getChild(0).getText(); - if (lvFuncName.compareToIgnoreCase( - GenericUDTFInline.class.getAnnotation(Description.class).name()) != 0) { - return false; - } - if (lvFunc.getChildCount() != 2) { - return false; - } Review Comment: Do we support arbitrary number of childs for lateral views? Is this a new feature? ########## ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java: ########## @@ -114,7 +115,7 @@ public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode input this.lateralViewRel = HiveTableFunctionScan.create(cluster, TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall, - null, retType, null); + null, retType, createColumnMappings(inputRel)); Review Comment: It is a bit strange that the column mapping depends only on the `inputRel` and the `udtfCall` does not have any impact on the result. Why is that? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java: ########## @@ -124,9 +124,12 @@ public void onMatch(RelOptRuleCall call) { RexNode newInlineCall = cluster.getRexBuilder().makeCall(tfs.getRowType(), inlineCall.op, newArrayCall); + // Use null for columnMappings. The return row type of the RelNode now comprises of + // all the fields within the UDTF, so there is no mapping from the output fields + // directly to the input fields anymore. final RelNode newTableFunctionScanNode = tfs.copy(tfs.getTraitSet(), tfs.getInputs(), newInlineCall, tfs.getElementType(), tfs.getRowType(), - tfs.getColumnMappings()); + /*columnMappings*/ null); Review Comment: nit: Putting the name of the variable inside comments is not necessary. IDEs are pretty intelligent these days.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org