[GitHub] [hive] kasakrisz commented on a change in pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

GitBox Thu, 30 Jul 2020 23:36:43 -0700


kasakrisz commented on a change in pull request #1324:
URL: https://github.com/apache/hive/pull/1324#discussion_r463430942




##########
File path: 
ql/src/test/results/clientpositive/llap/auto_join_reordering_values.q.out
##########
@@ -144,122 +144,30 @@ STAGE PLANS:
                         tag: 0
                         value expressions: _col0 (type: int), _col2 (type: 
int), _col3 (type: int)
                         auto parallelism: true
-            Execution mode: vectorized, llap
-            LLAP IO: no inputs
-            Path -> Alias:
-#### A masked pattern was here ####
-            Path -> Partition:
-#### A masked pattern was here ####
-                Partition
-                  base file name: orderpayment_small
-                  input format: org.apache.hadoop.mapred.TextInputFormat
-                  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-                  properties:
-                    bucket_count -1
-                    bucketing_version 2
-                    column.name.delimiter ,
-                    columns dealid,date,time,cityid,userid
-                    columns.types int:string:string:int:int
-#### A masked pattern was here ####
-                    name default.orderpayment_small
-                    serialization.format 1
-                    serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-                
-                    input format: org.apache.hadoop.mapred.TextInputFormat
-                    output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-                    properties:
-                      bucketing_version 2
-                      column.name.delimiter ,
-                      columns dealid,date,time,cityid,userid
-                      columns.comments 
-                      columns.types int:string:string:int:int
-#### A masked pattern was here ####
-                      name default.orderpayment_small
-                      serialization.format 1
-                      serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-                    name: default.orderpayment_small
-                  name: default.orderpayment_small
-            Truncated Path -> Alias:
-              /orderpayment_small [orderpayment]
-        Map 6 
-            Map Operator Tree:
-                TableScan
-                  alias: dim_pay_date
-                  filterExpr: date is not null (type: boolean)
-                  Statistics: Num rows: 1 Data size: 94 Basic stats: COMPLETE 
Column stats: COMPLETE
-                  GatherStats: false
                   Filter Operator
                     isSamplingPred: false
-                    predicate: date is not null (type: boolean)
-                    Statistics: Num rows: 1 Data size: 94 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    predicate: dealid is not null (type: boolean)
+                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                     Select Operator
-                      expressions: date (type: string)

Review comment:
       The TS pulls the column `date` only was not merged due to 
`validPreConditions` failed:
   ```
   2020-07-30T23:20:32,796 DEBUG [a51ac124-5fb2-43bc-a0fc-f27099762584 main] 
optimizer.SharedWorkOptimizer: After SharedWorkSJOptimizer:
   
TS[0]-FIL[44]-SEL[2]-RS[15]-MERGEJOIN[89]-RS[18]-MERGEJOIN[90]-RS[21]-MERGEJOIN[91]-RS[24]-MERGEJOIN[92]-SEL[27]-LIM[28]-FS[29]
   TS[3]-FIL[45]-SEL[5]-RS[16]-MERGEJOIN[89]
   TS[6]-FIL[46]-SEL[8]-RS[19]-MERGEJOIN[90]
   TS[9]-FIL[47]-SEL[11]-RS[22]-MERGEJOIN[91]
   TS[12]-FIL[48]-SEL[14]-RS[25]-MERGEJOIN[92]
   ```
   
   Both has the same output works:
   ```
   TS[0] 
   alias = "orderpayment"
   dbName = "default"
   tableName = "orderpayment_small"
   neededColumns = {ArrayList@24085}  size = 4
    0 = "dealid"
    1 = "date"
    2 = "cityid"
    3 = "userid"
   outputWorksOps1 = {HashSet@24017}  size = 2
    0 = {ReduceSinkOperator@24028} "RS[18]"
    1 = {CommonMergeJoinOperator@24029} "MERGEJOIN[89]"
   ```
   ```
   TS[3]
   alias = "dim_pay_date"
   dbName = "default"
   tableName = "orderpayment_small"
   neededColumns = {ArrayList@23791}  size = 1
    0 = "date"
   outputWorksOps2 = {HashSet@24022}  size = 2
    0 = {ReduceSinkOperator@24028} "RS[18]"
    1 = {CommonMergeJoinOperator@24029} "MERGEJOIN[89]"
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

[GitHub] [hive] kasakrisz commented on a change in pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

Reply via email to