zabetak commented on code in PR #4783: URL: https://github.com/apache/hive/pull/4783#discussion_r1355049106
########## ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out: ########## @@ -0,0 +1,250 @@ +PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl1 +POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl1 +PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl2 +POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl2 +PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl1 +PREHOOK: Output: default@tbl2@key=5 +PREHOOK: Output: default@tbl2@key=6 +POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl1 +POSTHOOK: Output: default@tbl2@key=5 +POSTHOOK: Output: default@tbl2@key=6 +STAGE DEPENDENCIES: + Stage-2 is a root stage + Stage-3 depends on stages: Stage-2 + Stage-0 depends on stages: Stage-3 + Stage-4 depends on stages: Stage-0 + Stage-1 depends on stages: Stage-3 + Stage-5 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-2 + Tez +#### A masked pattern was here #### + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) + Reducer 3 <- Reducer 2 (SIMPLE_EDGE) + Reducer 4 <- Reducer 2 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: tbl1 + filterExpr: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: f1 (type: int) + outputColumnNames: _col1 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (5 > 0) (type: boolean) Review Comment: That's a weird predicate we have here. Isn't 5 > 0 always true. I am suprised that it was not removed by CBO or another Physical rule. ########## ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out: ########## @@ -0,0 +1,250 @@ +PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl1 +POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl1 +PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl2 +POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl2 +PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl1 +PREHOOK: Output: default@tbl2@key=5 +PREHOOK: Output: default@tbl2@key=6 +POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl1 +POSTHOOK: Output: default@tbl2@key=5 +POSTHOOK: Output: default@tbl2@key=6 +STAGE DEPENDENCIES: + Stage-2 is a root stage + Stage-3 depends on stages: Stage-2 + Stage-0 depends on stages: Stage-3 + Stage-4 depends on stages: Stage-0 + Stage-1 depends on stages: Stage-3 + Stage-5 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-2 + Tez +#### A masked pattern was here #### + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) + Reducer 3 <- Reducer 2 (SIMPLE_EDGE) + Reducer 4 <- Reducer 2 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: tbl1 + filterExpr: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: f1 (type: int) + outputColumnNames: _col1 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (5 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col1 (type: int) + null sort order: z + sort order: + + Map-reduce partition columns: _col1 (type: int) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + value expressions: 5 (type: int) + Execution mode: vectorized, llap + LLAP IO: all inputs + Reducer 2 + Execution mode: llap + Reduce Operator Tree: + Forward + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (VALUE._col0 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Group By Operator + keys: KEY._col0 (type: int) + mode: complete + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + name: default.tbl2 + Select Operator + expressions: _col0 (type: int), UDFToInteger('5') (type: int) + outputColumnNames: f1, key + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Group By Operator + aggregations: min(f1), max(f1), count(1), count(f1), compute_bit_vector_hll(f1) + keys: key (type: int) + minReductionHashAggr: 0.99 + mode: hash + outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col0 (type: int) + null sort order: z + sort order: + + Map-reduce partition columns: _col0 (type: int) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: binary) + Filter Operator + predicate: (VALUE._col0 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Group By Operator + keys: KEY._col0 (type: int) + mode: complete + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + name: default.tbl2 Review Comment: This Filter, Group By, File operators seems identical with the branch just above. Probably we could have spared some operations if we were doing smarter things at the CBO layer. ########## ql/src/test/queries/clientpositive/multi_insert_gby5.q: ########## @@ -0,0 +1,16 @@ +set hive.cbo.fallback.strategy=NEVER; + +CREATE TABLE tbl1 (key int, f1 int); +CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int); + +EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a Review Comment: The plan looks valid but just to be on the safe side let's add some data in the table to ensure we are getting back correct results since this is something that never worked before. ########## ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out: ########## @@ -0,0 +1,250 @@ +PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl1 +POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl1 +PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl2 +POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl2 +PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl1 +PREHOOK: Output: default@tbl2@key=5 +PREHOOK: Output: default@tbl2@key=6 +POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl1 +POSTHOOK: Output: default@tbl2@key=5 +POSTHOOK: Output: default@tbl2@key=6 +STAGE DEPENDENCIES: + Stage-2 is a root stage + Stage-3 depends on stages: Stage-2 + Stage-0 depends on stages: Stage-3 + Stage-4 depends on stages: Stage-0 + Stage-1 depends on stages: Stage-3 + Stage-5 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-2 + Tez +#### A masked pattern was here #### + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) + Reducer 3 <- Reducer 2 (SIMPLE_EDGE) + Reducer 4 <- Reducer 2 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: tbl1 + filterExpr: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: f1 (type: int) + outputColumnNames: _col1 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (5 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col1 (type: int) + null sort order: z + sort order: + + Map-reduce partition columns: _col1 (type: int) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + value expressions: 5 (type: int) + Execution mode: vectorized, llap + LLAP IO: all inputs + Reducer 2 + Execution mode: llap + Reduce Operator Tree: + Forward + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (VALUE._col0 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Group By Operator + keys: KEY._col0 (type: int) + mode: complete + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + name: default.tbl2 + Select Operator + expressions: _col0 (type: int), UDFToInteger('5') (type: int) + outputColumnNames: f1, key + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Group By Operator + aggregations: min(f1), max(f1), count(1), count(f1), compute_bit_vector_hll(f1) + keys: key (type: int) + minReductionHashAggr: 0.99 + mode: hash + outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col0 (type: int) + null sort order: z + sort order: + + Map-reduce partition columns: _col0 (type: int) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: binary) Review Comment: This part of the plan seems related to auto gather statistics. Can we disable this feature or gather stats before hand to simplify the plan that we have to actually validate. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ########## @@ -5996,7 +5996,7 @@ private ReduceSinkOperator genCommonGroupByPlanReduceSinkOperator(QB qb, List<St for (Map.Entry<ASTNode, ExprNodeDesc> entry : nodeOutputs.entrySet()) { ASTNode parameter = entry.getKey(); ExprNodeDesc expression = entry.getValue(); - if (!(expression instanceof ExprNodeColumnDesc)) { + if (!(expression instanceof ExprNodeColumnDesc) && !ExprNodeConstantDesc.isFoldedFromCol(expression)) { Review Comment: Thanks for the elaborate answer. I don't fully grasp the problem but with your description I got a good intuition about it. Checking if a constant is folded from a column is more specific than just being a constant in general thus should have smaller impact on existing queries. Since nothing breaks and it solves the problem I am good to go with it. ########## ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out: ########## @@ -0,0 +1,250 @@ +PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl1 +POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl1 +PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl2 +POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl2 +PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl1 +PREHOOK: Output: default@tbl2@key=5 +PREHOOK: Output: default@tbl2@key=6 +POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl1 +POSTHOOK: Output: default@tbl2@key=5 +POSTHOOK: Output: default@tbl2@key=6 +STAGE DEPENDENCIES: + Stage-2 is a root stage + Stage-3 depends on stages: Stage-2 + Stage-0 depends on stages: Stage-3 + Stage-4 depends on stages: Stage-0 + Stage-1 depends on stages: Stage-3 + Stage-5 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-2 + Tez +#### A masked pattern was here #### + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) + Reducer 3 <- Reducer 2 (SIMPLE_EDGE) + Reducer 4 <- Reducer 2 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: tbl1 + filterExpr: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: f1 (type: int) + outputColumnNames: _col1 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (5 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col1 (type: int) + null sort order: z + sort order: + + Map-reduce partition columns: _col1 (type: int) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + value expressions: 5 (type: int) + Execution mode: vectorized, llap + LLAP IO: all inputs + Reducer 2 + Execution mode: llap + Reduce Operator Tree: + Forward + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (VALUE._col0 > 0) (type: boolean) Review Comment: This predicate also seems redundant. How come we missed it? The reason that I mention this is cause I get the feeling that we hit the error when we try to build this predicate that maybe shouldn't be there in the first place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org