jcamachor commented on a change in pull request #1562: URL: https://github.com/apache/hive/pull/1562#discussion_r511717213
########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query32.q.out ########## @@ -160,7 +160,7 @@ Stage-0 Select Operator [SEL_115] (rows=286549727 width=119) Output:["_col0","_col1","_col2"] Filter Operator [FIL_113] (rows=286549727 width=119) - predicate:(cs_sold_date_sk is not null and cs_item_sk BETWEEN DynamicValue(RS_28_item_i_item_sk_min) AND DynamicValue(RS_28_item_i_item_sk_max) and in_bloom_filter(cs_item_sk, DynamicValue(RS_28_item_i_item_sk_bloom_filter))) Review comment: SJ is gone. Is this expected? ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query92.q.out ########## @@ -164,7 +164,7 @@ Stage-0 Select Operator [SEL_115] (rows=143966864 width=119) Output:["_col0","_col1","_col2"] Filter Operator [FIL_113] (rows=143966864 width=119) - predicate:(ws_sold_date_sk is not null and ws_item_sk BETWEEN DynamicValue(RS_28_item_i_item_sk_min) AND DynamicValue(RS_28_item_i_item_sk_max) and in_bloom_filter(ws_item_sk, DynamicValue(RS_28_item_i_item_sk_bloom_filter))) Review comment: SJ got removed. Is this expected? ########## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ########## @@ -1136,4 +1173,30 @@ public static boolean isOr(ExprNodeDesc expr) { return false; } + public static boolean isAnd(ExprNodeDesc expr) { + if (expr instanceof ExprNodeGenericFuncDesc) { Review comment: I think you could use `ExprNodeDescExprFactory.isANDFuncCallExpr` or `FunctionRegistry.isOpAnd(expr)`? ########## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ########## @@ -2595,6 +2595,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal HIVE_SHARED_WORK_DPPUNION_OPTIMIZATION("hive.optimize.shared.work.dppunion", true, "Enables dppops unioning. This optimization will enable to merge multiple tablescans with different " + "dynamic filters into a single one (with a more complex filter)"), + HIVE_SHARED_WORK_DOWNSTREAM_MERGE("hive.optimize.shared.work.downstream.merge", true, + "Analyzes and merges equiv downstream operators after a successfull shared work optimization step."), Review comment: nit. typo 'successfull' ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query1b.q.out ########## @@ -176,7 +176,7 @@ STAGE PLANS: Map Operator Tree: TableScan alias: store_returns - filterExpr: (((sr_customer_sk is not null and sr_store_sk is not null and sr_returned_date_sk is not null) or (sr_store_sk is not null and sr_returned_date_sk is not null)) and sr_store_sk BETWEEN DynamicValue(RS_40_store_s_store_sk_min) AND DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, DynamicValue(RS_40_store_s_store_sk_bloom_filter))) (type: boolean) + filterExpr: (sr_store_sk BETWEEN DynamicValue(RS_40_store_s_store_sk_min) AND DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, DynamicValue(RS_40_store_s_store_sk_bloom_filter)) and ((sr_customer_sk is not null and sr_store_sk is not null and sr_returned_date_sk is not null) or (sr_store_sk is not null and sr_returned_date_sk is not null))) (type: boolean) Review comment: Same as above. Filter exprs order ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query54.q.out ########## @@ -202,156 +202,154 @@ Stage-0 predicate:(_col1 <= _col3) Merge Join Operator [MERGEJOIN_294] (rows=15218525 width=12) Conds:(Inner),Output:["_col0","_col1","_col3"] - <-Reducer 15 [CUSTOM_SIMPLE_EDGE] + <-Reducer 20 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_99] Filter Operator [FIL_98] (rows=608741 width=12) predicate:(_col2 <= _col1) Merge Join Operator [MERGEJOIN_291] (rows=1826225 width=12) Conds:(Inner),Output:["_col0","_col1","_col2"] <-Map 9 [CUSTOM_SIMPLE_EDGE] vectorized - PARTITION_ONLY_SHUFFLE [RS_327] Review comment: Change of algorithm to SHUFFLE. Is this expected? It seems the same changed happened for multiple ops in the plan. ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query25.q.out ########## @@ -176,92 +176,88 @@ Stage-0 Merge Join Operator [MERGEJOIN_246] (rows=21091882 width=154) Conds:RS_25._col2, _col1, _col4=RS_26._col2, _col1, _col3(Inner),Output:["_col1","_col3","_col5","_col8","_col9","_col11"] <-Reducer 10 [SIMPLE_EDGE] + SHUFFLE [RS_26] + PartitionCols:_col2, _col1, _col3 + Merge Join Operator [MERGEJOIN_245] (rows=9402909 width=100) + Conds:RS_270._col0=RS_256._col0(Inner),Output:["_col1","_col2","_col3","_col4"] + <-Map 8 [SIMPLE_EDGE] vectorized + PARTITION_ONLY_SHUFFLE [RS_256] + PartitionCols:_col0 + Select Operator [SEL_252] (rows=351 width=4) + Output:["_col0"] + Filter Operator [FIL_250] (rows=351 width=12) + predicate:((d_year = 2000) and d_moy BETWEEN 4 AND 10) + TableScan [TS_3] (rows=73049 width=12) + default@date_dim,d3,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year","d_moy"] Review comment: Are these new TS that are not reused anymore? For instance, it seems this one is the same that was reused in old L224. ########## File path: ql/src/test/results/clientpositive/llap/sharedwork_semi.q.out ########## @@ -541,7 +541,7 @@ STAGE PLANS: Map Operator Tree: TableScan alias: s - filterExpr: (ss_sold_date_sk is not null and ((ss_sold_date_sk BETWEEN DynamicValue(RS_7_d_d_date_sk_min) AND DynamicValue(RS_7_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_7_d_d_date_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN DynamicValue(RS_21_d_d_date_sk_min) AND DynamicValue(RS_21_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_21_d_d_date_sk_bloom_filter))))) (type: boolean) + filterExpr: (((ss_sold_date_sk BETWEEN DynamicValue(RS_7_d_d_date_sk_min) AND DynamicValue(RS_7_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_7_d_d_date_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN DynamicValue(RS_21_d_d_date_sk_min) AND DynamicValue(RS_21_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_21_d_d_date_sk_bloom_filter)))) and ss_sold_date_sk is not null) (type: boolean) Review comment: Iirc we order the expressions intentionally in such a way that the rest of expressions are evaluated before the SJ expression, since the probe of the bloom filter is usually more expensive than evaluating other expressions (heuristic). ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query1b.q.out ########## @@ -210,7 +210,7 @@ STAGE PLANS: Statistics: Num rows: 16855704 Data size: 2008197920 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: decimal(17,2)) Filter Operator - predicate: (sr_store_sk is not null and sr_returned_date_sk is not null and sr_store_sk BETWEEN DynamicValue(RS_40_store_s_store_sk_min) AND DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, DynamicValue(RS_40_store_s_store_sk_bloom_filter))) (type: boolean) + predicate: (sr_store_sk is not null and sr_returned_date_sk is not null) (type: boolean) Review comment: SJ went away? It is kept in the other branch (L182). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org