[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

Laljo John Pullokkaran (JIRA) Thu, 21 May 2015 11:38:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554822#comment-14554822
 ]


Laljo John Pullokkaran commented on HIVE-9069:
----------------------------------------------

w.r.t #2 The following query should explain the issue:
select t1.key from t1 join (select t2.key from t2 join (select * from t3 where 
c_int=1)t3 on t2.key=t3.c_int where ((t2.key=t3.key) and (t2.c_float + 
t3_cfloat > 2)) or ((t2.key=t3.key) and (t2.c_int + t3_cint > 2))) t4 on 
t1.key=t4.key
After transitive inference the filter should be 
((t3.key=1) and (t2.c_float + t3_cfloat > 2)) or ((t3.key=1) and (t2.c_int + 
t3_cint > 2)). 
>From this we can extract (t3.key=1) and push it down.

Also i think we might want to run 
1. constant reduction after transitive inference
2. PPD

Alternatively all of this should be in the same bucket so that one change 
triggers other rules. 
i.e
  a. constant Reduction
  b. Predicate factorization
  c. PPD
  d. Transitive inference

w.r.t #3
At the end we can pull filter only for the case where filter element is 
disjunction of conjunctions whether it is wrapped in an outer conjunction or 
not. But we shouldn't be doing this for disjunctions of disjunctions.

> Simplify filter predicates for CBO
> ----------------------------------
>
>                 Key: HIVE-9069
>                 URL: https://issues.apache.org/jira/browse/HIVE-9069
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Jesus Camacho Rodriguez
>             Fix For: 0.14.1
>
>         Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this is still an issue, some of the filters can be pushed down to 
> the scan.
> {code}
> set hive.cbo.enable=true
> set hive.stats.fetch.column.stats=true
> set hive.exec.dynamic.partition.mode=nonstrict
> set hive.tez.auto.reducer.parallelism=true
> set hive.auto.convert.join.noconditionaltask.size=320000000
> set hive.exec.reducers.bytes.per.reducer=100000000
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
> set hive.support.concurrency=false
> set hive.tez.exec.print.summary=true
> explain  
> select  substr(r_reason_desc,1,20) as r
>        ,avg(ws_quantity) wq
>        ,avg(wr_refunded_cash) ref
>        ,avg(wr_fee) fee
>  from web_sales, web_returns, web_page, customer_demographics cd1,
>       customer_demographics cd2, customer_address, date_dim, reason 
>  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
>    and web_sales.ws_item_sk = web_returns.wr_item_sk
>    and web_sales.ws_order_number = web_returns.wr_order_number
>    and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
>    and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
>    and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
>    and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
>    and reason.r_reason_sk = web_returns.wr_reason_sk
>    and
>    (
>     (
>      cd1.cd_marital_status = 'M'
>      and
>      cd1.cd_marital_status = cd2.cd_marital_status
>      and
>      cd1.cd_education_status = '4 yr Degree'
>      and 
>      cd1.cd_education_status = cd2.cd_education_status
>      and
>      ws_sales_price between 100.00 and 150.00
>     )
>    or
>     (
>      cd1.cd_marital_status = 'D'
>      and
>      cd1.cd_marital_status = cd2.cd_marital_status
>      and
>      cd1.cd_education_status = 'Primary' 
>      and
>      cd1.cd_education_status = cd2.cd_education_status
>      and
>      ws_sales_price between 50.00 and 100.00
>     )
>    or
>     (
>      cd1.cd_marital_status = 'U'
>      and
>      cd1.cd_marital_status = cd2.cd_marital_status
>      and
>      cd1.cd_education_status = 'Advanced Degree'
>      and
>      cd1.cd_education_status = cd2.cd_education_status
>      and
>      ws_sales_price between 150.00 and 200.00
>     )
>    )
>    and
>    (
>     (
>      ca_country = 'United States'
>      and
>      ca_state in ('KY', 'GA', 'NM')
>      and ws_net_profit between 100 and 200  
>     )
>     or
>     (
>      ca_country = 'United States'
>      and
>      ca_state in ('MT', 'OR', 'IN')
>      and ws_net_profit between 150 and 300  
>     )
>     or
>     (
>      ca_country = 'United States'
>      and
>      ca_state in ('WI', 'MO', 'WV')
>      and ws_net_profit between 50 and 250  
>     )
>    )
> group by r_reason_desc
> order by r, wq, ref, fee
> limit 100
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 9 <- Map 1 (BROADCAST_EDGE)
>         Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
>         Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
>         Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
>         Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
> (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
>         Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
>         Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
>       DagName: mmokhtar_20141111161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_page
>                   filterExpr: wp_web_page_sk is not null (type: boolean)
>                   Statistics: Num rows: 4602 Data size: 2696178 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: wp_web_page_sk is not null (type: boolean)
>                     Statistics: Num rows: 4602 Data size: 18408 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: wp_web_page_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 4602 Data size: 18408 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 4602 Data size: 18408 Basic 
> stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 10 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer_address
>                   filterExpr: ((ca_country = 'United States') and 
> ca_address_sk is not null) (type: boolean)
>                   Statistics: Num rows: 40000000 Data size: 40595195284 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ca_country = 'United States') and 
> ca_address_sk is not null) (type: boolean)
>                     Statistics: Num rows: 20000000 Data size: 3740000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ca_address_sk (type: int), ca_state (type: 
> string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 20000000 Data size: 1800000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 20000000 Data size: 1800000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col1 (type: string)
>             Execution mode: vectorized
>         Map 11 
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: ((d_year = 1998) and d_date_sk is not null) 
> (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((d_year = 1998) and d_date_sk is not null) 
> (type: boolean)
>                     Statistics: Num rows: 652 Data size: 5216 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: d_date_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 652 Data size: 2608 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 652 Data size: 2608 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col0 (type: int)
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 652 Data size: 2608 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           keys: _col0 (type: int)
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 326 Data size: 1304 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Dynamic Partitioning Event Operator
>                             Target Input: web_sales
>                             Partition key expr: ws_sold_date_sk
>                             Statistics: Num rows: 326 Data size: 1304 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                             Target column: ws_sold_date_sk
>                             Target Vertex: Map 9
>             Execution mode: vectorized
>         Map 12 
>             Map Operator Tree:
>                 TableScan
>                   alias: reason
>                   filterExpr: r_reason_sk is not null (type: boolean)
>                   Statistics: Num rows: 72 Data size: 14400 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: r_reason_sk is not null (type: boolean)
>                     Statistics: Num rows: 72 Data size: 7272 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: r_reason_sk (type: int), r_reason_desc 
> (type: string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 72 Data size: 7272 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 72 Data size: 7272 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                         value expressions: _col1 (type: string)
>             Execution mode: vectorized
>         Map 13 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_returns
>                   filterExpr: (((((wr_refunded_cdemo_sk is not null and 
> wr_item_sk is not null) and wr_order_number is not null) and 
> wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and 
> wr_reason_sk is not null) (type: boolean)
>                   Statistics: Num rows: 2062802370 Data size: 185695406284 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((((wr_refunded_cdemo_sk is not null and 
> wr_item_sk is not null) and wr_order_number is not null) and 
> wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and 
> wr_reason_sk is not null) (type: boolean)
>                     Statistics: Num rows: 1875154722 Data size: 58944640412 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: wr_item_sk (type: int), 
> wr_refunded_cdemo_sk (type: int), wr_refunded_addr_sk (type: int), 
> wr_returning_cdemo_sk (type: int), wr_reason_sk (type: int), wr_order_number 
> (type: int), wr_fee (type: float), wr_refunded_cash (type: float)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7
>                       Statistics: Num rows: 1875154722 Data size: 58944640412 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col1 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col1 (type: int)
>                         Statistics: Num rows: 1875154722 Data size: 
> 58944640412 Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col0 (type: int), _col2 (type: 
> int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: 
> float), _col7 (type: float)
>             Execution mode: vectorized
>         Map 14 
>             Map Operator Tree:
>                 TableScan
>                   alias: cd1
>                   filterExpr: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                   Statistics: Num rows: 1920800 Data size: 718379200 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                     Statistics: Num rows: 1920800 Data size: 351506400 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cd_demo_sk (type: int), cd_marital_status 
> (type: string), cd_education_status (type: string)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: 
> string), _col2 (type: string)
>                         sort order: +++
>                         Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: string), _col2 (type: string)
>                         Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: cd1
>                   filterExpr: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                   Statistics: Num rows: 1920800 Data size: 718379200 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                     Statistics: Num rows: 1920800 Data size: 351506400 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cd_demo_sk (type: int), cd_marital_status 
> (type: string), cd_education_status (type: string)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col1 (type: string), _col2 (type: 
> string)
>             Execution mode: vectorized
>         Map 9 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_sales
>                   filterExpr: ((ws_web_page_sk is not null and ws_item_sk is 
> not null) and ws_order_number is not null) (type: boolean)
>                   Statistics: Num rows: 21594638446 Data size: 2850189889652 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ws_web_page_sk is not null and ws_item_sk is 
> not null) and ws_order_number is not null) (type: boolean)
>                     Statistics: Num rows: 21591939929 Data size: 604541956128 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ws_item_sk (type: int), ws_web_page_sk 
> (type: int), ws_order_number (type: int), ws_quantity (type: int), 
> ws_sales_price (type: float), ws_net_profit (type: float), ws_sold_date_sk 
> (type: int)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6
>                       Statistics: Num rows: 21591939929 Data size: 
> 604541956128 Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col0} {_col2} {_col3} {_col4} {_col5} {_col6}
>                           1 
>                         keys:
>                           0 _col1 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col0, _col2, _col3, _col4, _col5, 
> _col6
>                         input vertices:
>                           1 Map 1
>                         Statistics: Num rows: 21591939072 Data size: 
> 518206537728 Basic stats: COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: int), _col2 (type: 
> int)
>                           sort order: ++
>                           Map-reduce partition columns: _col0 (type: int), 
> _col2 (type: int)
>                           Statistics: Num rows: 21591939072 Data size: 
> 518206537728 Basic stats: COMPLETE Column stats: COMPLETE
>                           value expressions: _col3 (type: int), _col4 (type: 
> float), _col5 (type: float), _col6 (type: int)
>             Execution mode: vectorized
>         Reducer 3 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} 
> {VALUE._col4} {VALUE._col5} {VALUE._col6}
>                   1 {VALUE._col0} {VALUE._col1}
>                 outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, 
> _col7, _col9, _col10
>                 Statistics: Num rows: 1875154688 Data size: 373155782912 
> Basic stats: COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: int), _col10 (type: string), 
> _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), 
> _col6 (type: float), _col7 (type: float), _col9 (type: string)
>                   outputColumnNames: _col0, _col10, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col9
>                   Statistics: Num rows: 1875154688 Data size: 373155782912 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: _col0 (type: int), _col5 (type: int)
>                     sort order: ++
>                     Map-reduce partition columns: _col0 (type: int), _col5 
> (type: int)
>                     Statistics: Num rows: 1875154688 Data size: 373155782912 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     value expressions: _col2 (type: int), _col3 (type: int), 
> _col4 (type: int), _col6 (type: float), _col7 (type: float), _col9 (type: 
> string), _col10 (type: string)
>         Reducer 4 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4}
>                   1 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} 
> {VALUE._col5} {VALUE._col7} {VALUE._col8}
>                 outputColumnNames: _col3, _col4, _col5, _col6, _col10, 
> _col11, _col12, _col14, _col15, _col17, _col18
>                 Statistics: Num rows: 57653145 Data size: 11472975855 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                 Filter Operator
>                   predicate: (((_col17 = 'M') and ((_col18 = '4 yr Degree') 
> and _col4 BETWEEN 100.0 AND 150.0)) or (((_col17 = 'D') and ((_col18 = 
> 'Primary') and _col4 BETWEEN 50.0 AND 100.0)) or ((_col17 = 'U') and ((_col18 
> = 'Advanced Degree') and _col4 BETWEEN 150.0 AND 200.0)))) (type: boolean)
>                   Statistics: Num rows: 57653145 Data size: 11472975855 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col11 (type: int), _col12 (type: int), 
> _col14 (type: float), _col15 (type: float), _col17 (type: string), _col18 
> (type: string), _col3 (type: int), _col5 (type: float), _col6 (type: int), 
> _col10 (type: int)
>                     outputColumnNames: _col10, _col11, _col13, _col14, 
> _col17, _col18, _col3, _col5, _col6, _col9
>                     Statistics: Num rows: 57653145 Data size: 11472975855 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col10 (type: int), _col17 (type: 
> string), _col18 (type: string)
>                       sort order: +++
>                       Map-reduce partition columns: _col10 (type: int), 
> _col17 (type: string), _col18 (type: string)
>                       Statistics: Num rows: 57653145 Data size: 11472975855 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       value expressions: _col3 (type: int), _col5 (type: 
> float), _col6 (type: int), _col9 (type: int), _col11 (type: int), _col13 
> (type: float), _col14 (type: float)
>         Reducer 5 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 
>                   1 {VALUE._col3} {VALUE._col5} {VALUE._col6} {VALUE._col9} 
> {VALUE._col10} {VALUE._col12} {VALUE._col13}
>                 outputColumnNames: _col6, _col8, _col9, _col12, _col14, 
> _col16, _col17
>                 Statistics: Num rows: 3187317548 Data size: 50997080768 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col12 (type: int), _col14 (type: int), _col16 
> (type: float), _col17 (type: float), _col6 (type: int), _col8 (type: float), 
> _col9 (type: int)
>                   outputColumnNames: _col12, _col14, _col16, _col17, _col6, 
> _col8, _col9
>                   Statistics: Num rows: 3187317548 Data size: 50997080768 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: _col12 (type: int)
>                     sort order: +
>                     Map-reduce partition columns: _col12 (type: int)
>                     Statistics: Num rows: 3187317548 Data size: 50997080768 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     value expressions: _col6 (type: int), _col8 (type: 
> float), _col9 (type: int), _col14 (type: int), _col16 (type: float), _col17 
> (type: float)
>         Reducer 6 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col0}
>                   1 {VALUE._col6} {VALUE._col8} {VALUE._col9} {VALUE._col13} 
> {VALUE._col15} {VALUE._col16}
>                 outputColumnNames: _col1, _col9, _col11, _col12, _col17, 
> _col19, _col20
>                 Statistics: Num rows: 1593658752 Data size: 156178557696 
> Basic stats: COMPLETE Column stats: COMPLETE
>                 Filter Operator
>                   predicate: (((_col1) IN ('KY', 'GA', 'NM') and _col11 
> BETWEEN 100 AND 200) or (((_col1) IN ('MT', 'OR', 'IN') and _col11 BETWEEN 
> 150 AND 300) or ((_col1) IN ('WI', 'MO', 'WV') and _col11 BETWEEN 50 AND 
> 250))) (type: boolean)
>                   Statistics: Num rows: 1195244064 Data size: 117133918272 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col17 (type: int), _col19 (type: float), 
> _col20 (type: float), _col9 (type: int), _col12 (type: int)
>                     outputColumnNames: _col11, _col13, _col14, _col3, _col6
>                     Statistics: Num rows: 1195244064 Data size: 14342928768 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 
>                         1 {_col3} {_col11} {_col13} {_col14}
>                       keys:
>                         0 _col0 (type: int)
>                         1 _col6 (type: int)
>                       outputColumnNames: _col5, _col13, _col15, _col16
>                       input vertices:
>                         0 Map 11
>                       Statistics: Num rows: 1334416318 Data size: 16012995816 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col13 (type: int), _col15 (type: 
> float), _col16 (type: float), _col5 (type: int)
>                         outputColumnNames: _col13, _col15, _col16, _col5
>                         Statistics: Num rows: 1334416318 Data size: 
> 16012995816 Basic stats: COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Inner Join 0 to 1
>                           condition expressions:
>                             0 {_col1}
>                             1 {_col5} {_col15} {_col16}
>                           keys:
>                             0 _col0 (type: int)
>                             1 _col13 (type: int)
>                           outputColumnNames: _col1, _col7, _col17, _col18
>                           input vertices:
>                             0 Map 12
>                           Statistics: Num rows: 1334416256 Data size: 
> 140113706880 Basic stats: COMPLETE Column stats: COMPLETE
>                           Select Operator
>                             expressions: _col1 (type: string), _col7 (type: 
> int), _col18 (type: float), _col17 (type: float)
>                             outputColumnNames: _col0, _col1, _col2, _col3
>                             Statistics: Num rows: 1334416256 Data size: 
> 140113706880 Basic stats: COMPLETE Column stats: COMPLETE
>                             Group By Operator
>                               aggregations: avg(_col1), avg(_col2), avg(_col3)
>                               keys: _col0 (type: string)
>                               mode: hash
>                               outputColumnNames: _col0, _col1, _col2, _col3
>                               Statistics: Num rows: 157024 Data size: 
> 15231328 Basic stats: COMPLETE Column stats: COMPLETE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: string)
>                                 sort order: +
>                                 Map-reduce partition columns: _col0 (type: 
> string)
>                                 Statistics: Num rows: 157024 Data size: 
> 15231328 Basic stats: COMPLETE Column stats: COMPLETE
>                                 value expressions: _col1 (type: 
> struct<count:bigint,sum:double,input:int>), _col2 (type: 
> struct<count:bigint,sum:double,input:float>), _col3 (type: 
> struct<count:bigint,sum:double,input:float>)
>         Reducer 7 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: avg(VALUE._col0), avg(VALUE._col1), 
> avg(VALUE._col2)
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 112 Data size: 13552 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: substr(_col0, 1, 20) (type: string), _col1 
> (type: double), _col2 (type: double), _col3 (type: double)
>                   outputColumnNames: _col0, _col1, _col2, _col3
>                   Statistics: Num rows: 112 Data size: 23296 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: _col0 (type: string), _col1 (type: 
> double), _col2 (type: double), _col3 (type: double)
>                     sort order: ++++
>                     Statistics: Num rows: 112 Data size: 23296 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     TopN Hash Memory Usage: 0.04
>         Reducer 8 
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: string), 
> KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), 
> KEY.reducesinkkey3 (type: double)
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 112 Data size: 23296 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 Limit
>                   Number of rows: 100
>                   Statistics: Num rows: 100 Data size: 20800 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 100 Data size: 20800 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>             Execution mode: vectorized
>   Stage: Stage-0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

Reply via email to