jcamachor commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r511717213



##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query32.q.out
##########
@@ -160,7 +160,7 @@ Stage-0
                                     Select Operator [SEL_115] (rows=286549727 
width=119)
                                       Output:["_col0","_col1","_col2"]
                                       Filter Operator [FIL_113] 
(rows=286549727 width=119)
-                                        predicate:(cs_sold_date_sk is not null 
and cs_item_sk BETWEEN DynamicValue(RS_28_item_i_item_sk_min) AND 
DynamicValue(RS_28_item_i_item_sk_max) and in_bloom_filter(cs_item_sk, 
DynamicValue(RS_28_item_i_item_sk_bloom_filter)))

Review comment:
       SJ is gone. Is this expected?

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query92.q.out
##########
@@ -164,7 +164,7 @@ Stage-0
                                     Select Operator [SEL_115] (rows=143966864 
width=119)
                                       Output:["_col0","_col1","_col2"]
                                       Filter Operator [FIL_113] 
(rows=143966864 width=119)
-                                        predicate:(ws_sold_date_sk is not null 
and ws_item_sk BETWEEN DynamicValue(RS_28_item_i_item_sk_min) AND 
DynamicValue(RS_28_item_i_item_sk_max) and in_bloom_filter(ws_item_sk, 
DynamicValue(RS_28_item_i_item_sk_bloom_filter)))

Review comment:
       SJ got removed. Is this expected?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -1136,4 +1173,30 @@ public static boolean isOr(ExprNodeDesc expr) {
     return false;
   }
 
+  public static boolean isAnd(ExprNodeDesc expr) {
+    if (expr instanceof ExprNodeGenericFuncDesc) {

Review comment:
       I think you could use `ExprNodeDescExprFactory.isANDFuncCallExpr` or 
`FunctionRegistry.isOpAnd(expr)`?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -2595,6 +2595,8 @@ private static void populateLlapDaemonVarsSet(Set<String> 
llapDaemonVarsSetLocal
     
HIVE_SHARED_WORK_DPPUNION_OPTIMIZATION("hive.optimize.shared.work.dppunion", 
true,
         "Enables dppops unioning. This optimization will enable to merge 
multiple tablescans with different "
             + "dynamic filters into a single one (with a more complex 
filter)"),
+    
HIVE_SHARED_WORK_DOWNSTREAM_MERGE("hive.optimize.shared.work.downstream.merge", 
true,
+        "Analyzes and merges equiv downstream operators after a successfull 
shared work optimization step."),

Review comment:
       nit. typo 'successfull'

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query1b.q.out
##########
@@ -176,7 +176,7 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: store_returns
-                  filterExpr: (((sr_customer_sk is not null and sr_store_sk is 
not null and sr_returned_date_sk is not null) or (sr_store_sk is not null and 
sr_returned_date_sk is not null)) and sr_store_sk BETWEEN 
DynamicValue(RS_40_store_s_store_sk_min) AND 
DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, 
DynamicValue(RS_40_store_s_store_sk_bloom_filter))) (type: boolean)
+                  filterExpr: (sr_store_sk BETWEEN 
DynamicValue(RS_40_store_s_store_sk_min) AND 
DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, 
DynamicValue(RS_40_store_s_store_sk_bloom_filter)) and ((sr_customer_sk is not 
null and sr_store_sk is not null and sr_returned_date_sk is not null) or 
(sr_store_sk is not null and sr_returned_date_sk is not null))) (type: boolean)

Review comment:
       Same as above. Filter exprs order

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query54.q.out
##########
@@ -202,156 +202,154 @@ Stage-0
                                           predicate:(_col1 <= _col3)
                                           Merge Join Operator [MERGEJOIN_294] 
(rows=15218525 width=12)
                                             
Conds:(Inner),Output:["_col0","_col1","_col3"]
-                                          <-Reducer 15 [CUSTOM_SIMPLE_EDGE]
+                                          <-Reducer 20 [CUSTOM_SIMPLE_EDGE]
                                             PARTITION_ONLY_SHUFFLE [RS_99]
                                               Filter Operator [FIL_98] 
(rows=608741 width=12)
                                                 predicate:(_col2 <= _col1)
                                                 Merge Join Operator 
[MERGEJOIN_291] (rows=1826225 width=12)
                                                   
Conds:(Inner),Output:["_col0","_col1","_col2"]
                                                 <-Map 9 [CUSTOM_SIMPLE_EDGE] 
vectorized
-                                                  PARTITION_ONLY_SHUFFLE 
[RS_327]

Review comment:
       Change of algorithm to SHUFFLE. Is this expected? It seems the same 
changed happened for multiple ops in the plan.

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query25.q.out
##########
@@ -176,92 +176,88 @@ Stage-0
                                   Merge Join Operator [MERGEJOIN_246] 
(rows=21091882 width=154)
                                     Conds:RS_25._col2, _col1, 
_col4=RS_26._col2, _col1, 
_col3(Inner),Output:["_col1","_col3","_col5","_col8","_col9","_col11"]
                                   <-Reducer 10 [SIMPLE_EDGE]
+                                    SHUFFLE [RS_26]
+                                      PartitionCols:_col2, _col1, _col3
+                                      Merge Join Operator [MERGEJOIN_245] 
(rows=9402909 width=100)
+                                        
Conds:RS_270._col0=RS_256._col0(Inner),Output:["_col1","_col2","_col3","_col4"]
+                                      <-Map 8 [SIMPLE_EDGE] vectorized
+                                        PARTITION_ONLY_SHUFFLE [RS_256]
+                                          PartitionCols:_col0
+                                          Select Operator [SEL_252] (rows=351 
width=4)
+                                            Output:["_col0"]
+                                            Filter Operator [FIL_250] 
(rows=351 width=12)
+                                              predicate:((d_year = 2000) and 
d_moy BETWEEN 4 AND 10)
+                                              TableScan [TS_3] (rows=73049 
width=12)
+                                                
default@date_dim,d3,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year","d_moy"]

Review comment:
       Are these new TS that are not reused anymore? For instance, it seems 
this one is the same that was reused in old L224.

##########
File path: ql/src/test/results/clientpositive/llap/sharedwork_semi.q.out
##########
@@ -541,7 +541,7 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: s
-                  filterExpr: (ss_sold_date_sk is not null and 
((ss_sold_date_sk BETWEEN DynamicValue(RS_7_d_d_date_sk_min) AND 
DynamicValue(RS_7_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_7_d_d_date_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN 
DynamicValue(RS_21_d_d_date_sk_min) AND DynamicValue(RS_21_d_d_date_sk_max) and 
in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_21_d_d_date_sk_bloom_filter))))) (type: boolean)
+                  filterExpr: (((ss_sold_date_sk BETWEEN 
DynamicValue(RS_7_d_d_date_sk_min) AND DynamicValue(RS_7_d_d_date_sk_max) and 
in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_7_d_d_date_sk_bloom_filter))) 
or (ss_sold_date_sk BETWEEN DynamicValue(RS_21_d_d_date_sk_min) AND 
DynamicValue(RS_21_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_21_d_d_date_sk_bloom_filter)))) and ss_sold_date_sk is not 
null) (type: boolean)

Review comment:
       Iirc we order the expressions intentionally in such a way that the rest 
of expressions are evaluated before the SJ expression, since the probe of the 
bloom filter is usually more expensive than evaluating other expressions 
(heuristic).

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query1b.q.out
##########
@@ -210,7 +210,7 @@ STAGE PLANS:
                             Statistics: Num rows: 16855704 Data size: 
2008197920 Basic stats: COMPLETE Column stats: COMPLETE
                             value expressions: _col2 (type: decimal(17,2))
                   Filter Operator
-                    predicate: (sr_store_sk is not null and 
sr_returned_date_sk is not null and sr_store_sk BETWEEN 
DynamicValue(RS_40_store_s_store_sk_min) AND 
DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, 
DynamicValue(RS_40_store_s_store_sk_bloom_filter))) (type: boolean)
+                    predicate: (sr_store_sk is not null and 
sr_returned_date_sk is not null) (type: boolean)

Review comment:
       SJ went away? It is kept in the other branch (L182).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to