zabetak commented on code in PR #4442:
URL: https://github.com/apache/hive/pull/4442#discussion_r1288646268
##########
ql/src/test/queries/clientpositive/lateral_view.q:
##########
@@ -10,6 +10,11 @@ EXPLAIN SELECT myTable.* FROM src LATERAL VIEW
explode(array(1,2,3)) myTable AS
EXPLAIN SELECT myTable.myCol, myTable2.myCol2 FROM src LATERAL VIEW
explode(array(1,2,3)) myTable AS myCol LATERAL VIEW explode(array('a', 'b',
'c')) myTable2 AS myCol2 LIMIT 9;
EXPLAIN SELECT myTable2.* FROM src LATERAL VIEW explode(array(array(1,2,3)))
myTable AS myCol LATERAL VIEW explode(myTable.myCol) myTable2 AS myCol2 LIMIT 3;
+EXPLAIN CBO SELECT * FROM src LATERAL VIEW explode(array(1,2,3)) myTable AS
myCol SORT BY key ASC, myCol ASC LIMIT 1;
Review Comment:
It is a good idea to verify the `EXPLAIN CBO` plan in the presence of
lateral views. I am wondering if it would be better to identify a few
representative queries and then put the EXPLAIN CBO statements in a new .q file
(e.g., `cbo_lateral_views.q`) going from the simpler to the more complicate
queries. These would make testing more explicit and it would also reduce the
changes in existing .q.out files leading to fewer merge conflicts when
backporting. WDYT?
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -124,9 +125,12 @@ public void onMatch(RelOptRuleCall call) {
RexNode newInlineCall =
cluster.getRexBuilder().makeCall(tfs.getRowType(), inlineCall.op,
newArrayCall);
+ // Use empty listfor columnMappings. The return row type of the RelNode
now comprises of
+ // all the fields within the UDTF, so there is no mapping from the output
fields
+ // directly to the input fields anymore.
final RelNode newTableFunctionScanNode = tfs.copy(tfs.getTraitSet(),
tfs.getInputs(), newInlineCall, tfs.getElementType(), tfs.getRowType(),
- tfs.getColumnMappings());
+ new HashSet<>());
Review Comment:
nit: `Collections.emptySet()` is a more efficient choice unless we want to
modify the set.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -577,6 +588,77 @@ private QueryBlockInfo convertSource(RelNode r) throws
CalciteSemanticException
ast = ASTBuilder.subQuery(left, sqAlias);
s = new Schema((Union) r, sqAlias);
}
+ } else if (r instanceof HiveTableFunctionScan &&
+ !canOptimizeOutLateralView((HiveTableFunctionScan) r)) {
Review Comment:
Is this optimization something that we could defer to a follow-up ticket so
that we converge and merge this PR faster ? That would also make the diff and
git history easier to follow.
If we want to push it here then let me share what I was thinking with a
concrete SQL example grabbed from existing tests.
```sql
SELECT myCol from tmp_pyang_lv LATERAL VIEW explode(array(1,2,3)) myTab as
myCol limit 3
```
The current plan is the following:
```
CBO PLAN:
HiveSortLimit(fetch=[3])
HiveProject(mycol=[$5])
HiveTableFunctionScan(invocation=[explode(ARRAY(1, 2, 3))],
rowType=[RecordType(VARCHAR(2147483647) inputs, BIGINT
BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME,
RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN
ROW__IS__DELETED, INTEGER mytab.mycol)])
HiveTableScan(table=[[default, tmp_pyang_lv]],
table:alias=[tmp_pyang_lv])
```
I was thinking that we could have a CBO optimization that turns the plan
above to something like the following:
```
CBO PLAN:
HiveSortLimit(fetch=[3])
HiveTableFunctionScan(invocation=[explode(ARRAY(1, 2, 3))],
rowType=[RecordType(INTEGER mytab.mycol)])
```
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -621,6 +644,109 @@ private ASTNode pkFkHint(int fkTableIndex, boolean
nonFkSideIsFiltered) {
}
}
+ private ASTNode createASTLateralView(TableFunctionScan tfs, Schema s,
+ QueryBlockInfo tableFunctionSource, String sqAlias) {
+ // In the case where the RelNode is a HiveTableFunctionScan, first we check
+ // to see if we can't optimize out the lateral view operator. We can
optimize the
+ // operator out if only the udtf fields are grabbed out of the RelNode.
If any
+ // of the base table fields need to be grabbed out, then a 'join' needs to
be done
+ // and we need the lateral view.
+ //
Review Comment:
The comment seems misplaced here. Maybe it is also a bit redundant given the
javadoc in `canOptimizeOutLateralView` method.
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -114,7 +115,7 @@ public LateralViewPlan(ASTNode lateralView, RelOptCluster
cluster, RelNode input
this.lateralViewRel = HiveTableFunctionScan.create(cluster,
TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel),
udtfCall,
- null, retType, null);
+ null, retType, createColumnMappings(inputRel));
Review Comment:
I was thinking that there is also a third category:
3. Columns from the `inputRel` that are transformed based on the UDTF and
appear in the result (something that would enable the derived attribute in
`RelColumnMapping`).
Anyways I don't fully understand how `RelColumnMapping` is supposed to work
so I suppose we can leave with this mapping for the moment.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java:
##########
@@ -382,6 +387,11 @@ private static boolean validExchangeChild(HiveSortExchange
sortNode) {
return sortNode.getInput() instanceof Project;
}
+ private static boolean validTableFunctionScanChild(HiveTableFunctionScan
htfsNode) {
+ return htfsNode.getInputs().size() == 1 &&
+ (htfsNode.getInput(0) instanceof Project || htfsNode.getInput(0)
instanceof HiveTableScan);
Review Comment:
Thanks for the explanation. Not fully understand either but now I see the
pattern so the changes look reasonable.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterTableFunctionTransposeRule.java:
##########
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.metadata.RelColumnMapping;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.commons.collections4.CollectionUtils;
+import org.apache.hadoop.hive.ql.exec.FunctionRegistry;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Rule to transpose Filter and TableFunctionScan RelNodes
+ *
+ * This differs from the FilterTableFunctionTransposeRule for a number of
reasons:
+ * - We check if the filter condition is deterministic under Hive rules. We
cannot
+ * push filters that aren't deterministic.
+ * - We check if the filter only references columns that are not part of the
udtf.
+ * If the input ref in the filter references a column that is generated by
the
+ * udtf, the filter cannot be pushed through.
Review Comment:
I get the impression that `FilterTableFunctionTransposeRule` also does the
same check using column mappings.
I would also suggest to isolate this transformation to a separate patch if
that is possible. The reasoning is the same less code to review faster to merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]