zabetak commented on code in PR #4783:
URL: https://github.com/apache/hive/pull/4783#discussion_r1355049106


##########
ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out:
##########
@@ -0,0 +1,250 @@
+PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl1
+POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl1
+PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl2
+POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl2
+PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl1
+PREHOOK: Output: default@tbl2@key=5
+PREHOOK: Output: default@tbl2@key=6
+POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl1
+POSTHOOK: Output: default@tbl2@key=5
+POSTHOOK: Output: default@tbl2@key=6
+STAGE DEPENDENCIES:
+  Stage-2 is a root stage
+  Stage-3 depends on stages: Stage-2
+  Stage-0 depends on stages: Stage-3
+  Stage-4 depends on stages: Stage-0
+  Stage-1 depends on stages: Stage-3
+  Stage-5 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-2
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+        Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: tbl1
+                  filterExpr: (key = 5) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Filter Operator
+                    predicate: (key = 5) (type: boolean)
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    Select Operator
+                      expressions: f1 (type: int)
+                      outputColumnNames: _col1
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      Filter Operator
+                        predicate: (5 > 0) (type: boolean)

Review Comment:
   That's a weird predicate we have here. Isn't 5 > 0 always true. I am 
suprised that it was not removed by CBO or another Physical rule.



##########
ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out:
##########
@@ -0,0 +1,250 @@
+PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl1
+POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl1
+PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl2
+POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl2
+PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl1
+PREHOOK: Output: default@tbl2@key=5
+PREHOOK: Output: default@tbl2@key=6
+POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl1
+POSTHOOK: Output: default@tbl2@key=5
+POSTHOOK: Output: default@tbl2@key=6
+STAGE DEPENDENCIES:
+  Stage-2 is a root stage
+  Stage-3 depends on stages: Stage-2
+  Stage-0 depends on stages: Stage-3
+  Stage-4 depends on stages: Stage-0
+  Stage-1 depends on stages: Stage-3
+  Stage-5 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-2
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+        Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: tbl1
+                  filterExpr: (key = 5) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Filter Operator
+                    predicate: (key = 5) (type: boolean)
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    Select Operator
+                      expressions: f1 (type: int)
+                      outputColumnNames: _col1
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      Filter Operator
+                        predicate: (5 > 0) (type: boolean)
+                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                        Reduce Output Operator
+                          key expressions: _col1 (type: int)
+                          null sort order: z
+                          sort order: +
+                          Map-reduce partition columns: _col1 (type: int)
+                          Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                          value expressions: 5 (type: int)
+            Execution mode: vectorized, llap
+            LLAP IO: all inputs
+        Reducer 2 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Forward
+                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                Filter Operator
+                  predicate: (VALUE._col0 > 0) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Group By Operator
+                    keys: KEY._col0 (type: int)
+                    mode: complete
+                    outputColumnNames: _col0
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    File Output Operator
+                      compressed: false
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      table:
+                          input format: 
org.apache.hadoop.mapred.TextInputFormat
+                          output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+                          serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+                          name: default.tbl2
+                    Select Operator
+                      expressions: _col0 (type: int), UDFToInteger('5') (type: 
int)
+                      outputColumnNames: f1, key
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      Group By Operator
+                        aggregations: min(f1), max(f1), count(1), count(f1), 
compute_bit_vector_hll(f1)
+                        keys: key (type: int)
+                        minReductionHashAggr: 0.99
+                        mode: hash
+                        outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5
+                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                        Reduce Output Operator
+                          key expressions: _col0 (type: int)
+                          null sort order: z
+                          sort order: +
+                          Map-reduce partition columns: _col0 (type: int)
+                          Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                          value expressions: _col1 (type: int), _col2 (type: 
int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: binary)
+                Filter Operator
+                  predicate: (VALUE._col0 > 0) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Group By Operator
+                    keys: KEY._col0 (type: int)
+                    mode: complete
+                    outputColumnNames: _col0
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    File Output Operator
+                      compressed: false
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      table:
+                          input format: 
org.apache.hadoop.mapred.TextInputFormat
+                          output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+                          serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+                          name: default.tbl2

Review Comment:
   This Filter, Group By, File operators seems identical with the branch just 
above. Probably we could have spared some operations if we were doing smarter 
things at the CBO layer.



##########
ql/src/test/queries/clientpositive/multi_insert_gby5.q:
##########
@@ -0,0 +1,16 @@
+set hive.cbo.fallback.strategy=NEVER;
+
+CREATE TABLE tbl1 (key int, f1 int);
+CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int);
+
+EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a

Review Comment:
   The plan looks valid but just to be on the safe side let's add some data in 
the table to ensure we are getting back correct results since this is something 
that never worked before.



##########
ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out:
##########
@@ -0,0 +1,250 @@
+PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl1
+POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl1
+PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl2
+POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl2
+PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl1
+PREHOOK: Output: default@tbl2@key=5
+PREHOOK: Output: default@tbl2@key=6
+POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl1
+POSTHOOK: Output: default@tbl2@key=5
+POSTHOOK: Output: default@tbl2@key=6
+STAGE DEPENDENCIES:
+  Stage-2 is a root stage
+  Stage-3 depends on stages: Stage-2
+  Stage-0 depends on stages: Stage-3
+  Stage-4 depends on stages: Stage-0
+  Stage-1 depends on stages: Stage-3
+  Stage-5 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-2
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+        Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: tbl1
+                  filterExpr: (key = 5) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Filter Operator
+                    predicate: (key = 5) (type: boolean)
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    Select Operator
+                      expressions: f1 (type: int)
+                      outputColumnNames: _col1
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      Filter Operator
+                        predicate: (5 > 0) (type: boolean)
+                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                        Reduce Output Operator
+                          key expressions: _col1 (type: int)
+                          null sort order: z
+                          sort order: +
+                          Map-reduce partition columns: _col1 (type: int)
+                          Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                          value expressions: 5 (type: int)
+            Execution mode: vectorized, llap
+            LLAP IO: all inputs
+        Reducer 2 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Forward
+                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                Filter Operator
+                  predicate: (VALUE._col0 > 0) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Group By Operator
+                    keys: KEY._col0 (type: int)
+                    mode: complete
+                    outputColumnNames: _col0
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    File Output Operator
+                      compressed: false
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      table:
+                          input format: 
org.apache.hadoop.mapred.TextInputFormat
+                          output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+                          serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+                          name: default.tbl2
+                    Select Operator
+                      expressions: _col0 (type: int), UDFToInteger('5') (type: 
int)
+                      outputColumnNames: f1, key
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      Group By Operator
+                        aggregations: min(f1), max(f1), count(1), count(f1), 
compute_bit_vector_hll(f1)
+                        keys: key (type: int)
+                        minReductionHashAggr: 0.99
+                        mode: hash
+                        outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5
+                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                        Reduce Output Operator
+                          key expressions: _col0 (type: int)
+                          null sort order: z
+                          sort order: +
+                          Map-reduce partition columns: _col0 (type: int)
+                          Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                          value expressions: _col1 (type: int), _col2 (type: 
int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: binary)

Review Comment:
   This part of the plan seems related to auto gather statistics. Can we 
disable this feature or gather stats before hand to simplify the plan that we 
have to actually validate.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -5996,7 +5996,7 @@ private ReduceSinkOperator 
genCommonGroupByPlanReduceSinkOperator(QB qb, List<St
         for (Map.Entry<ASTNode, ExprNodeDesc> entry : nodeOutputs.entrySet()) {
           ASTNode parameter = entry.getKey();
           ExprNodeDesc expression = entry.getValue();
-          if (!(expression instanceof ExprNodeColumnDesc)) {
+          if (!(expression instanceof ExprNodeColumnDesc) && 
!ExprNodeConstantDesc.isFoldedFromCol(expression)) {

Review Comment:
   Thanks for the elaborate answer. I don't fully grasp the problem but with 
your description I got a good intuition about it. 
   
   Checking if a constant is folded from a column is more specific than just 
being a constant in general thus should have smaller impact on existing 
queries. Since nothing breaks and it solves the problem I am good to go with it.



##########
ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out:
##########
@@ -0,0 +1,250 @@
+PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl1
+POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl1
+PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl2
+POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl2
+PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl1
+PREHOOK: Output: default@tbl2@key=5
+PREHOOK: Output: default@tbl2@key=6
+POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a
+INSERT OVERWRITE TABLE tbl2 PARTITION(key=5)
+SELECT f1 WHERE key > 0 GROUP BY f1
+INSERT OVERWRITE TABLE tbl2 partition(key=6)
+SELECT f1 WHERE key > 0 GROUP BY f1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl1
+POSTHOOK: Output: default@tbl2@key=5
+POSTHOOK: Output: default@tbl2@key=6
+STAGE DEPENDENCIES:
+  Stage-2 is a root stage
+  Stage-3 depends on stages: Stage-2
+  Stage-0 depends on stages: Stage-3
+  Stage-4 depends on stages: Stage-0
+  Stage-1 depends on stages: Stage-3
+  Stage-5 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-2
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+        Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: tbl1
+                  filterExpr: (key = 5) (type: boolean)
+                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                  Filter Operator
+                    predicate: (key = 5) (type: boolean)
+                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                    Select Operator
+                      expressions: f1 (type: int)
+                      outputColumnNames: _col1
+                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                      Filter Operator
+                        predicate: (5 > 0) (type: boolean)
+                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                        Reduce Output Operator
+                          key expressions: _col1 (type: int)
+                          null sort order: z
+                          sort order: +
+                          Map-reduce partition columns: _col1 (type: int)
+                          Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
+                          value expressions: 5 (type: int)
+            Execution mode: vectorized, llap
+            LLAP IO: all inputs
+        Reducer 2 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Forward
+                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
+                Filter Operator
+                  predicate: (VALUE._col0 > 0) (type: boolean)

Review Comment:
   This predicate also seems redundant. How come we missed it?
   
   The reason that I mention this is cause I get the feeling that we hit the 
error when we try to build this predicate that maybe shouldn't be there in the 
first place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to