[jira] [Work logged] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows

ASF GitHub Bot (Jira) Thu, 29 Sep 2022 14:52:10 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26524?focusedWorklogId=813509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-813509
 ]


ASF GitHub Bot logged work on HIVE-26524:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Sep/22 21:49
            Start Date: 29/Sep/22 21:49
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on code in PR #3588:
URL: https://github.com/apache/hive/pull/3588#discussion_r983479717


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/Bug.java:
##########
@@ -74,4 +74,14 @@ public final class Bug {
    * Whether <a 
href="https://issues.apache.org/jira/browse/CALCITE-4704";>CALCITE-4704</a> is 
fixed.
    */
   public static final boolean CALCITE_4704_FIXED = false;
+
+  /**
+   * Whether <a 
href="https://issues.apache.org/jira/browse/CALCITE-4704";>CALCITE-5293</a> is 
fixed.

Review Comment:
   nit: Link points to the wrong JIRA.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/Bug.java:
##########
@@ -74,4 +74,14 @@ public final class Bug {
    * Whether <a 
href="https://issues.apache.org/jira/browse/CALCITE-4704";>CALCITE-4704</a> is 
fixed.
    */
   public static final boolean CALCITE_4704_FIXED = false;
+
+  /**
+   * Whether <a 
href="https://issues.apache.org/jira/browse/CALCITE-4704";>CALCITE-5293</a> is 
fixed.
+   */
+  public static final boolean CALCITE_5293_FIXED = false;
+
+  /**
+   * Whether <a 
href="https://issues.apache.org/jira/browse/CALCITE-4704";>CALCITE-5294</a> is 
fixed.

Review Comment:
   nit: Link points to the wrong JIRA.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -121,7 +127,87 @@ public static ASTNode convert(final RelNode relNode, 
List<FieldSchema> resultSch
     return c.convert();
   }
 
+  //    TOK_QUERY
+  //      TOK_INSERT
+  //         TOK_DESTINATION
+  //            TOK_DIR
+  //               TOK_TMP_FILE
+  //         TOK_SELECT
+  //            TOK_SELEXPR
+  //               TOK_FUNCTION
+  //                  TOK_<type>
+  //                  TOK_NULL
+  //               alias0
+  //            ...
+  //            TOK_SELEXPR
+  //               TOK_FUNCTION
+  //                  TOK_<type>
+  //                  TOK_NULL
+  //               aliasn
+  //         TOK_LIMIT
+  //            0
+  //            0
+  public static ASTNode emptyPlan(RelDataType dataType) {
+    if (dataType.getFieldCount() == 0) {
+      throw new IllegalArgumentException("Schema is empty.");
+    }
+
+    ASTBuilder select = ASTBuilder.construct(HiveParser.TOK_SELECT, 
"TOK_SELECT");
+    for (int i = 0; i < dataType.getFieldCount(); ++i) {
+      RelDataTypeField fieldType = dataType.getFieldList().get(i);
+      if (fieldType.getValue().getSqlTypeName() == SqlTypeName.NULL) {
+        select.add(ASTBuilder.selectExpr(
+                ASTBuilder.construct(HiveParser.TOK_NULL, "TOK_NULL").node(),
+                fieldType.getName()));
+      } else {
+        ASTNode typeNode = createCast(fieldType);
+        select.add(ASTBuilder.selectExpr(
+                ASTBuilder.construct(HiveParser.TOK_FUNCTION, "TOK_FUNCTION")
+                        .add(typeNode)
+                        .add(ASTBuilder.construct(HiveParser.TOK_NULL, 
"TOK_NULL").node()).node(),
+                fieldType.getName()));
+      }
+    }
+
+    ASTNode insert = ASTBuilder.
+            construct(HiveParser.TOK_INSERT, "TOK_INSERT").
+            add(ASTBuilder.destNode()).
+            add(select).
+            add(ASTBuilder.limit(0, 0)).
+            node();
+
+    return ASTBuilder.
+            construct(HiveParser.TOK_QUERY, "TOK_QUERY").
+            add(insert).
+            node();
+  }
+
+  private static ASTNode createCast(RelDataTypeField fieldType) {
+    HiveToken ht = TypeConverter.hiveToken(fieldType.getType());
+    ASTNode typeNode;
+    if (ht == null) {
+      typeNode = ASTBuilder.construct(
+              HiveParser.Identifier, 
fieldType.getType().getSqlTypeName().getName().toLowerCase()).node();
+    } else {
+      ASTBuilder typeNodeBuilder = ASTBuilder.construct(ht.type, ht.text);
+      if (ht.args != null) {
+        for (String castArg : ht.args) {
+          typeNodeBuilder.add(HiveParser.Identifier, castArg);
+        }
+      }
+      typeNode = typeNodeBuilder.node();
+    }
+    return typeNode;
+  }
+
   private ASTNode convert() throws CalciteSemanticException {
+    if (root instanceof HiveValues) {
+      HiveValues values = (HiveValues) root;
+      if (isEmpty(values)) {
+        select = values;
+        return emptyPlan(values.getRowType());
+      }

Review Comment:
   If we have a `HiveValues` that it is not empty can the code below handle it? 
If not, I would put an assertion with a proper comment.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveValues.java:
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Values;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexLiteral;
+
+import java.util.List;
+
+/**
+ * Subclass of {@link org.apache.calcite.rel.core.Values}.
+ * Targeting Hive engine.
+ */
+public class HiveValues extends Values {
+
+  public HiveValues(
+          RelOptCluster cluster,
+          RelDataType rowType,
+          ImmutableList<ImmutableList<RexLiteral>> tuples,
+          RelTraitSet traits) {
+    super(cluster, rowType, tuples, traits);
+  }
+
+  @Override
+  public RelNode copy(RelTraitSet traitSet, List<RelNode> inputs) {
+    if (getInputs().equals(inputs) && traitSet.equals(getTraitSet())) {
+      return this;
+    }
+

Review Comment:
   I think the Javadoc of `copy` implies that the method creates a new 
instance; better avoid returning `this`.
   
   If we want to be extra cautious we can introduce assertions similar to those 
in `LogicalValues`.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -121,7 +127,87 @@ public static ASTNode convert(final RelNode relNode, 
List<FieldSchema> resultSch
     return c.convert();
   }
 
+  //    TOK_QUERY
+  //      TOK_INSERT
+  //         TOK_DESTINATION
+  //            TOK_DIR
+  //               TOK_TMP_FILE
+  //         TOK_SELECT
+  //            TOK_SELEXPR
+  //               TOK_FUNCTION
+  //                  TOK_<type>
+  //                  TOK_NULL
+  //               alias0
+  //            ...
+  //            TOK_SELEXPR
+  //               TOK_FUNCTION
+  //                  TOK_<type>
+  //                  TOK_NULL
+  //               aliasn
+  //         TOK_LIMIT
+  //            0
+  //            0
+  public static ASTNode emptyPlan(RelDataType dataType) {
+    if (dataType.getFieldCount() == 0) {
+      throw new IllegalArgumentException("Schema is empty.");
+    }
+
+    ASTBuilder select = ASTBuilder.construct(HiveParser.TOK_SELECT, 
"TOK_SELECT");
+    for (int i = 0; i < dataType.getFieldCount(); ++i) {
+      RelDataTypeField fieldType = dataType.getFieldList().get(i);
+      if (fieldType.getValue().getSqlTypeName() == SqlTypeName.NULL) {
+        select.add(ASTBuilder.selectExpr(
+                ASTBuilder.construct(HiveParser.TOK_NULL, "TOK_NULL").node(),
+                fieldType.getName()));
+      } else {
+        ASTNode typeNode = createCast(fieldType);
+        select.add(ASTBuilder.selectExpr(
+                ASTBuilder.construct(HiveParser.TOK_FUNCTION, "TOK_FUNCTION")
+                        .add(typeNode)
+                        .add(ASTBuilder.construct(HiveParser.TOK_NULL, 
"TOK_NULL").node()).node(),
+                fieldType.getName()));
+      }
+    }
+
+    ASTNode insert = ASTBuilder.
+            construct(HiveParser.TOK_INSERT, "TOK_INSERT").
+            add(ASTBuilder.destNode()).
+            add(select).
+            add(ASTBuilder.limit(0, 0)).
+            node();
+
+    return ASTBuilder.
+            construct(HiveParser.TOK_QUERY, "TOK_QUERY").
+            add(insert).
+            node();
+  }
+
+  private static ASTNode createCast(RelDataTypeField fieldType) {

Review Comment:
   Do we handle STRUCT types? Should we?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/opconventer/HiveValuesVisitor.java:
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.translator.opconventer;
+
+import org.apache.calcite.rel.core.Values;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveValues;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.opconventer.HiveOpConverter.OpAttr;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.LimitDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableScanDesc;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+class HiveValuesVisitor extends HiveRelNodeVisitor<HiveValues> {
+  HiveValuesVisitor(HiveOpConverter hiveOpConverter) {
+    super(hiveOpConverter);
+  }
+
+  @Override
+  OpAttr visit(HiveValues valuesRel) throws SemanticException {
+
+    LOG.debug("Translating operator rel#{}:{} with row type: [{}]",
+            valuesRel.getId(), valuesRel.getRelTypeName(), 
valuesRel.getRowType());
+    LOG.debug("Operator rel#{}:{} has {} tuples.",
+            valuesRel.getId(), valuesRel.getRelTypeName(), 
valuesRel.tuples.size());
+
+    if (!Values.isEmpty(valuesRel)) {
+      LOG.error("Empty {} operator translation not supported yet in return 
path.",
+              valuesRel.getClass().getCanonicalName());
+      return null;
+    }
+
+    // 1. collect columns for project row schema

Review Comment:
   nit: I tend to avoid enumerations in comments cause as code evolves and 
steps are added/removed people will probably update the numbers leading to more 
changes than necessary.
   
   Moreover the comment would be completely redundant (and possibly the code 
more readable) if you create a function such as `List<ExprNodeDesc> 
createColumnsForProject`.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -121,7 +127,87 @@ public static ASTNode convert(final RelNode relNode, 
List<FieldSchema> resultSch
     return c.convert();
   }
 
+  //    TOK_QUERY
+  //      TOK_INSERT
+  //         TOK_DESTINATION
+  //            TOK_DIR
+  //               TOK_TMP_FILE
+  //         TOK_SELECT
+  //            TOK_SELEXPR
+  //               TOK_FUNCTION
+  //                  TOK_<type>
+  //                  TOK_NULL
+  //               alias0
+  //            ...
+  //            TOK_SELEXPR
+  //               TOK_FUNCTION
+  //                  TOK_<type>
+  //                  TOK_NULL
+  //               aliasn
+  //         TOK_LIMIT
+  //            0
+  //            0
+  public static ASTNode emptyPlan(RelDataType dataType) {

Review Comment:
   Since this is public I would put the plan in javadoc and also add an 
explanation about why this particular AST representation was chosen. Not 
everybody knows that this is not gonna be executed by Hive due to subsequent 
optimizations. 



##########
ql/src/test/queries/clientpositive/antijoin.q:
##########
@@ -44,7 +44,7 @@ explain cbo select a.key, a.value from t1_n55 a left join 
t2_n33 b on a.key=b.ke
 select a.key, a.value from t1_n55 a left join t2_n33 b on a.key=b.key join 
t3_n12 c on a.key=c.key where b.key is null  sort by a.key, a.value;
 
 

Issue Time Tracking
-------------------

    Worklog Id:     (was: 813509)
    Time Spent: 4h 20m  (was: 4h 10m)

> Use Calcite to remove sections of a query plan known never produces rows
> ------------------------------------------------------------------------
>
>                 Key: HIVE-26524
>                 URL: https://issues.apache.org/jira/browse/HIVE-26524
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Calcite has a set of rules to remove sections of a query plan known never 
> produces any rows. In some cases the whole plan can be removed. Such plans 
> are represented with a single {{Values}} operators with no tuples. ex.:
> {code:java}
> select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0
> {code}
> {code:java}
> HiveValues(tuples=[[]])
> {code}
> Other cases when plan has outer join or set operators some branches can be 
> replaced with empty values moving forward in some cases the join/set operator 
> can be removed
> {code:java}
> select a2, b2 from t2 where 1=0
> union
> select a1, b1 from t1
> {code}
> {code:java}
> HiveAggregate(group=[{0, 1}])
>   HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows

Reply via email to