[jira] [Commented] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

liyunzhang (JIRA) Mon, 04 Dec 2017 01:42:05 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276526#comment-16276526
 ]


liyunzhang commented on HIVE-17486:
-----------------------------------

explain.28.scan.share.true

we can see that there is only operator (TS) in Map1, and the child of TS to the 
RS are belongs to another Map(Map12,Map15,Map18,Map2,Map6,Map9). So  change 
current {{M-R}} in 1 SparkTask to {{M-M-R}}
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Map 12 <- Map 1 (NONE, 1000)
        Map 15 <- Map 1 (NONE, 1000)
        Map 18 <- Map 1 (NONE, 1000)
        Map 2 <- Map 1 (NONE, 1000)
        Map 6 <- Map 1 (NONE, 1000)
        Map 9 <- Map 1 (NONE, 1000)
        Reducer 10 <- Map 9 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 13 <- Map 12 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 16 <- Map 15 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 19 <- Map 18 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 3 <- Map 2 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 4 <- Reducer 10 (PARTITION-LEVEL SORT, 1), Reducer 13 
(PARTITION-LEVEL SORT, 1), Reducer 16 (PARTITION-LEVEL SORT, 1), Reducer 19 
(PARTITION-LEVEL SORT, 1), Reducer 3 (PARTITION-LEVEL SORT, 1), Reducer 7 
(PARTITION-LEVEL SORT, 1)
        Reducer 7 <- Map 6 (GROUP PARTITION-LEVEL SORT, 1)
      DagName: root_20171204042631_0435ff7e-3f10-4c84-a5fc-dc5b607497ba:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: ((ss_quantity BETWEEN 0 AND 5 and (ss_list_price 
BETWEEN 11 AND 21 or ss_coupon_amt BETWEEN 460 AND 1460 or ss_wholesale_cost 
BETWEEN 14 AND 34)) or (ss_quantity BETWEEN 6 AND 10 and (ss_list_price BETWEEN 
91 AND 101 or ss_coupon_amt BETWEEN 1430 AND 2430 or ss_wholesale_cost BETWEEN 
32 AND 52)) or (ss_quantity BETWEEN 11 AND 15 and (ss_list_price BETWEEN 66 AND 
76 or ss_coupon_amt BETWEEN 920 AND 1920 or ss_wholesale_cost BETWEEN 4 AND 
24)) or (ss_quantity BETWEEN 16 AND 20 and (ss_list_price BETWEEN 142 AND 152 
or ss_coupon_amt BETWEEN 3054 AND 4054 or ss_wholesale_cost BETWEEN 80 AND 
100)) or (ss_quantity BETWEEN 21 AND 25 and (ss_list_price BETWEEN 135 AND 145 
or ss_coupon_amt BETWEEN 14180 AND 15180 or ss_wholesale_cost BETWEEN 38 AND 
58)) or (ss_quantity BETWEEN 26 AND 30 and (ss_list_price BETWEEN 28 AND 38 or 
ss_coupon_amt BETWEEN 2513 AND 3513 or ss_wholesale_cost BETWEEN 42 AND 62))) 
(type: boolean)
                  Statistics: Num rows: 28800991 Data size: 4751513940 Basic 
stats: COMPLETE Column stats: NONE
            Execution mode: vectorized
        Map 12 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 16 AND 20 and (ss_list_price 
BETWEEN 142 AND 152 or ss_coupon_amt BETWEEN 3054 AND 4054 or ss_wholesale_cost 
BETWEEN 80 AND 100)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), 
count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 
Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 15 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 21 AND 25 and (ss_list_price 
BETWEEN 135 AND 145 or ss_coupon_amt BETWEEN 14180 AND 15180 or 
ss_wholesale_cost BETWEEN 38 AND 58)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), 
count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 
Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 18 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 26 AND 30 and (ss_list_price 
BETWEEN 28 AND 38 or ss_coupon_amt BETWEEN 2513 AND 3513 or ss_wholesale_cost 
BETWEEN 42 AND 62)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), 
count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 
Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 2 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 0 AND 5 and (ss_list_price 
BETWEEN 11 AND 21 or ss_coupon_amt BETWEEN 460 AND 1460 or ss_wholesale_cost 
BETWEEN 14 AND 34)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), 
count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 
Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 6 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 6 AND 10 and (ss_list_price 
BETWEEN 91 AND 101 or ss_coupon_amt BETWEEN 1430 AND 2430 or ss_wholesale_cost 
BETWEEN 32 AND 52)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), 
count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 
Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 9 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 11 AND 15 and (ss_list_price 
BETWEEN 66 AND 76 or ss_coupon_amt BETWEEN 920 AND 1920 or ss_wholesale_cost 
BETWEEN 4 AND 24)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), 
count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 
Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Reducer 10 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), 
count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: 
bigint), _col2 (type: bigint)
        Reducer 13 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), 
count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: 
bigint), _col2 (type: bigint)
        Reducer 16 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), 
count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: 
bigint), _col2 (type: bigint)
        Reducer 19 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), 
count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: 
bigint), _col2 (type: bigint)
        Reducer 3 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), 
count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: 
bigint), _col2 (type: bigint)
        Reducer 4 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                     Inner Join 0 to 2
                     Inner Join 0 to 3
                     Inner Join 0 to 4
                     Inner Join 0 to 5
                keys:
                  0 
                  1 
                  2 
                  3 
                  4 
                  5 
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, 
_col16, _col17
                Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE 
Column stats: NONE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE 
Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 625 Basic stats: 
COMPLETE Column stats: NONE
                    table:
                        input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 7 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), 
count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE 
Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: 
bigint), _col2 (type: bigint)

  Stage: Stage-0
    Fetch Operator
      limit: 100
      Processor Tree:
        ListSink


{code}

> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>         Attachments: scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.  In Hive on Spark, it caches the result of spark work if the 
> spark work is used by more than 1 child spark work. After sharedWorkOptimizer 
> is enabled in physical plan in HoS, the identical table scans are merged to 1 
> table scan. This result of table scan will be used by more 1 child spark 
> work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

Reply via email to