[ https://issues.apache.org/jira/browse/HIVE-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar updated HIVE-9604: ---------------------------------- Component/s: CBO Affects Version/s: 0.14.0 > CBO : Presence of hybrid join condition sets of join order optimizations > ------------------------------------------------------------------------ > > Key: HIVE-9604 > URL: https://issues.apache.org/jira/browse/HIVE-9604 > Project: Hive > Issue Type: New Feature > Components: CBO > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Laljo John Pullokkaran > > When a query has a join between two tables on hybrid join condition > (conjunction + disjunction) the resulting join order from CBO is suboptimal. > Re-writing the query results in 9x performance improvement. > This was observed in several TPC-DS queries like Q72 and Q64. > The culprit join conditions are : > {code} > catalog_sales.cs_item_sk = inventory.inv_item_sk > and inv_quantity_on_hand < cs_quantity > {code} > This is a simplified version of Q72. > {code} > select count(*) total_cnt > from catalog_sales > join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk) > join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk) > join item on (item.i_item_sk = catalog_sales.cs_item_sk) > join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = > customer_demographics.cd_demo_sk) > join household_demographics on (catalog_sales.cs_bill_hdemo_sk = > household_demographics.hd_demo_sk) > join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk) > join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk) > join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk) > where d1.d_week_seq = d2.d_week_seq > and inv_quantity_on_hand < cs_quantity > and d3.d_date > d1.d_date + 5 > and hd_buy_potential = '1001-5000' > and d1.d_year = 2001 > and hd_buy_potential = '1001-5000' > and cd_marital_status = 'M' > and d1.d_year = 2001; > {code} > If the inventory table is moved down in the join order a more efficient plan > is generated > Modified query > {code} > select count(*) total_cnt > from catalog_sales > join item on (item.i_item_sk = catalog_sales.cs_item_sk) > join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = > customer_demographics.cd_demo_sk) > join household_demographics on (catalog_sales.cs_bill_hdemo_sk = > household_demographics.hd_demo_sk) > join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk) > join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk) > join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk) > join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk) > join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk) > where d1.d_week_seq = d2.d_week_seq > and inv_quantity_on_hand < cs_quantity > and d3.d_date > d1.d_date + 5 > and hd_buy_potential = '1001-5000' > and d1.d_year = 2001 > and hd_buy_potential = '1001-5000' > and cd_marital_status = 'M' > and d1.d_year = 2001; > {code} > Plan with base query notice how catalog_sales joins with inventory first > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 3 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 11 > (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 13 (BROADCAST_EDGE), Map 2 > (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 > (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) > Reducer 4 <- Map 3 (SIMPLE_EDGE) > Reducer 5 <- Reducer 4 (SIMPLE_EDGE) > DagName: mmokhtar_20141015151414_a08eae06-7250-4833-9e1d-8e58eb69780e:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: d1 > filterExpr: (d_date_sk is not null and d_week_seq is not > null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (d_date_sk is not null and d_week_seq is not > null) (type: boolean) > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int), d_week_seq (type: > int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int) > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 0 Basic stats: > PARTIAL Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 0 Basic > stats: PARTIAL Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: inventory > Partition key expr: inv_date_sk > Statistics: Num rows: 73049 Data size: 0 Basic > stats: PARTIAL Column stats: COMPLETE > Target column: inv_date_sk > Target Vertex: Map 7 > Execution mode: vectorized > Map 10 > Map Operator Tree: > TableScan > alias: catalog_returns > Statistics: Num rows: 28798881 Data size: 2942039156 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cr_item_sk (type: int), cr_order_number > (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 28798881 Data size: 230391048 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int), _col1 > (type: int) > Statistics: Num rows: 28798881 Data size: 230391048 > Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 11 > Map Operator Tree: > TableScan > alias: customer_demographics > filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is > not null) (type: boolean) > Statistics: Num rows: 1920800 Data size: 718379200 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((cd_marital_status = 'M') and cd_demo_sk is > not null) (type: boolean) > Statistics: Num rows: 274400 Data size: 24421600 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cd_demo_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 274400 Data size: 1097600 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 274400 Data size: 1097600 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 12 > Map Operator Tree: > TableScan > alias: promotion > Statistics: Num rows: 450 Data size: 530848 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: p_promo_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 450 Data size: 1800 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 450 Data size: 1800 Basic stats: > COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 13 > Map Operator Tree: > TableScan > alias: household_demographics > filterExpr: ((hd_buy_potential = '1001-5000') and > hd_demo_sk is not null) (type: boolean) > Statistics: Num rows: 7200 Data size: 770400 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((hd_buy_potential = '1001-5000') and > hd_demo_sk is not null) (type: boolean) > Statistics: Num rows: 1440 Data size: 138240 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: hd_demo_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1440 Data size: 5760 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1440 Data size: 5760 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: d1 > filterExpr: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 7158802 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int), d_date (type: > string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 73049 Data size: 7158802 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 73049 Data size: 7158802 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string) > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: catalog_sales > filterExpr: (((cs_item_sk is not null and cs_bill_hdemo_sk > is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not > null) (type: boolean) > Statistics: Num rows: 286549727 Data size: 37743959324 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((cs_item_sk is not null and cs_bill_hdemo_sk > is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not > null) (type: boolean) > Statistics: Num rows: 284396955 Data size: 7948828532 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cs_ship_date_sk (type: int), > cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type: > int), cs_promo_sk (type: int), cs_order_number (type: int), cs_quantity > (type: int), cs_sold_date_sk (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col5, _col6, _col7 > Statistics: Num rows: 284396955 Data size: 9086416352 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} > {_col6} {_col7} > 1 {_col1} {_col2} {_col3} > keys: > 0 _col3 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col5, _col6, _col7, _col9, _col10, _col11 > input vertices: > 1 Map 7 > Statistics: Num rows: 275157677926 Data size: > 8805045693632 Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (_col10 < _col6) (type: boolean) > Statistics: Num rows: 91719225975 Data size: > 2935015231200 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int), _col1 (type: > int), _col11 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: > int), _col5 (type: int), _col7 (type: int), _col9 (type: int) > outputColumnNames: _col0, _col1, _col11, _col2, > _col3, _col4, _col5, _col7, _col9 > Statistics: Num rows: 91719225975 Data size: > 2201261423400 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col1} {_col2} {_col3} {_col4} > {_col5} {_col9} {_col11} > 1 {_col1} {_col2} > keys: > 0 _col7 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, > _col4, _col5, _col9, _col11, _col13, _col14 > input vertices: > 1 Map 8 > Statistics: Num rows: 102398861876 Data size: > 12083065701368 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col1} {_col2} {_col3} {_col4} > {_col5} {_col9} {_col13} {_col14} > 1 > keys: > 0 _col11 (type: int), _col14 (type: int) > 1 _col0 (type: int), _col1 (type: int) > outputColumnNames: _col0, _col1, _col2, > _col3, _col4, _col5, _col9, _col13, _col14 > input vertices: > 1 Map 1 > Statistics: Num rows: 8705458 Data size: > 992422212 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col1} {_col3} {_col4} {_col5} > {_col9} {_col13} {_col14} > 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col3, > _col4, _col5, _col9, _col13, _col14 > input vertices: > 1 Map 13 > Statistics: Num rows: 6739709 Data size: > 768326826 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int), _col1 > (type: int), _col13 (type: string), _col14 (type: int), _col3 (type: int), > _col4 (type: int), _col5 (type: int), _col9 (type: int) > outputColumnNames: _col0, _col1, _col13, > _col14, _col3, _col4, _col5, _col9 > Statistics: Num rows: 6739709 Data size: > 768326826 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 > 1 {_col0} {_col3} {_col4} {_col5} > {_col9} {_col13} {_col14} > keys: > 0 _col0 (type: int) > 1 _col1 (type: int) > outputColumnNames: _col2, _col5, _col6, > _col7, _col11, _col15, _col16 > input vertices: > 0 Map 11 > Statistics: Num rows: 7051609 Data > size: 803883426 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col11 (type: int), > _col15 (type: string), _col16 (type: int), _col2 (type: int), _col5 (type: > int), _col6 (type: int), _col7 (type: int) > outputColumnNames: _col11, _col15, > _col16, _col2, _col5, _col6, _col7 > Statistics: Num rows: 7051609 Data > size: 803883426 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} > 1 {_col5} {_col6} {_col7} > {_col11} {_col15} {_col16} > keys: > 0 _col0 (type: int) > 1 _col2 (type: int) > outputColumnNames: _col1, _col7, > _col8, _col9, _col13, _col17, _col18 > input vertices: > 0 Map 2 > Statistics: Num rows: 7884543 Data > size: 1639984944 Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (UDFToDouble(_col1) > > (UDFToDouble(_col17) + UDFToDouble(5))) (type: boolean) > Statistics: Num rows: 2628181 > Data size: 546661648 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col13 (type: > int), _col18 (type: int), _col7 (type: int), _col8 (type: int), _col9 (type: > int) > outputColumnNames: _col13, > _col18, _col7, _col8, _col9 > Statistics: Num rows: 2628181 > Data size: 52563620 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} > 1 {_col7} {_col8} {_col9} > {_col13} {_col18} > keys: > 0 _col0 (type: int) > 1 _col7 (type: int) > outputColumnNames: _col1, > _col9, _col10, _col11, _col15, _col20 > input vertices: > 0 Map 6 > Statistics: Num rows: > 29222304 Data size: 5961350016 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} {_col9} > {_col10} {_col11} {_col20} > 1 {_col1} > keys: > 0 _col15 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col1, > _col9, _col10, _col11, _col20, _col27 > input vertices: > 1 Map 9 > Statistics: Num rows: > 35066764 Data size: 10484962436 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col27 > (type: string), _col1 (type: string), _col20 (type: int), _col9 (type: int), > _col10 (type: int), _col11 (type: int) > outputColumnNames: > _col13, _col15, _col22, _col3, _col4, _col5 > Statistics: Num rows: > 35066764 Data size: 10484962436 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Right Outer Join0 > to 1 > condition expressions: > 0 {_col0} > 1 {_col3} {_col5} > {_col13} {_col15} {_col22} > keys: > 0 _col0 (type: int) > 1 _col4 (type: int) > outputColumnNames: > _col0, _col4, _col6, _col14, _col16, _col23 > input vertices: > 0 Map 12 > Statistics: Num rows: > 631201752 Data size: 188729323848 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 > (type: int), _col14 (type: string), _col16 (type: string), _col23 (type: > int), _col4 (type: int), _col6 (type: int) > outputColumnNames: > _col0, _col14, _col16, _col23, _col4, _col6 > Statistics: Num rows: > 631201752 Data size: 188729323848 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Right Outer > Join0 to 1 > condition > expressions: > 0 > 1 {_col0} > {_col14} {_col16} {_col23} > keys: > 0 _col0 (type: > int), _col1 (type: int) > 1 _col4 (type: > int), _col6 (type: int) > outputColumnNames: > _col2, _col16, _col18, _col25 > input vertices: > 0 Map 10 > Statistics: Num > rows: 610256859 Data size: 177584745969 Basic stats: COMPLETE Column stats: > COMPLETE > Select Operator > expressions: > _col18 (type: string), _col16 (type: string), _col25 (type: int), CASE WHEN > (_col2 is null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col2 is not > null) THEN (1) ELSE (0) END (type: int) > > outputColumnNames: _col0, _col1, _col2, _col3, _col4 > Statistics: Num > rows: 610256859 Data size: 177584745969 Basic stats: COMPLETE Column stats: > COMPLETE > Group By Operator > aggregations: > count(_col3), count(_col4), count() > keys: _col0 > (type: string), _col1 (type: string), _col2 (type: int) > mode: hash > > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 > Statistics: Num > rows: 44400 Data size: 1243200 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output > Operator > key > expressions: _col0 (type: string), _col1 (type: string), _col2 (type: int) > sort order: > +++ > Map-reduce > partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: > int) > Statistics: > Num rows: 44400 Data size: 1243200 Basic stats: COMPLETE Column stats: > COMPLETE > value > expressions: _col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint) > Execution mode: vectorized > Map 6 > Map Operator Tree: > TableScan > alias: item > filterExpr: i_item_sk is not null (type: boolean) > Statistics: Num rows: 48000 Data size: 68732712 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: i_item_sk is not null (type: boolean) > Statistics: Num rows: 48000 Data size: 9024000 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int), i_item_desc (type: > string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 48000 Data size: 9024000 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 48000 Data size: 9024000 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string) > Execution mode: vectorized > Map 7 > Map Operator Tree: > TableScan > alias: inventory > filterExpr: (inv_item_sk is not null and inv_warehouse_sk > is not null) (type: boolean) > Statistics: Num rows: 37584000 Data size: 443485104 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (inv_item_sk is not null and inv_warehouse_sk > is not null) (type: boolean) > Statistics: Num rows: 37584000 Data size: 443485104 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: inv_item_sk (type: int), inv_warehouse_sk > (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 37584000 Data size: 593821104 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 37584000 Data size: 593821104 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: int), _col2 (type: > int), _col3 (type: int) > Execution mode: vectorized > Map 8 > Map Operator Tree: > TableScan > alias: d1 > filterExpr: (((d_year = 2001) and d_date_sk is not null) > and d_week_seq is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((d_year = 2001) and d_date_sk is not null) > and d_week_seq is not null) (type: boolean) > Statistics: Num rows: 652 Data size: 69112 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int), d_date (type: > string), d_week_seq (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 652 Data size: 66504 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 652 Data size: 66504 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string), _col2 (type: > int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 652 Data size: 0 Basic stats: > PARTIAL Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 652 Data size: 0 Basic stats: > PARTIAL Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: catalog_sales > Partition key expr: cs_sold_date_sk > Statistics: Num rows: 652 Data size: 0 Basic > stats: PARTIAL Column stats: COMPLETE > Target column: cs_sold_date_sk > Target Vertex: Map 3 > Execution mode: vectorized > Map 9 > Map Operator Tree: > TableScan > alias: warehouse > filterExpr: w_warehouse_sk is not null (type: boolean) > Statistics: Num rows: 6 Data size: 6166 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: w_warehouse_sk is not null (type: boolean) > Statistics: Num rows: 6 Data size: 618 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: w_warehouse_sk (type: int), > w_warehouse_name (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 618 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 6 Data size: 618 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string) > Execution mode: vectorized > Reducer 4 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0), count(VALUE._col1), > count(VALUE._col2) > keys: KEY._col0 (type: string), KEY._col1 (type: string), > KEY._col2 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 > Statistics: Num rows: 44400 Data size: 2131200 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: string), _col1 (type: string), > _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: > bigint) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 > Statistics: Num rows: 44400 Data size: 2131200 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col5 (type: bigint), _col0 (type: > string), _col1 (type: string), _col2 (type: int) > sort order: -+++ > Statistics: Num rows: 44400 Data size: 2131200 Basic > stats: COMPLETE Column stats: COMPLETE > TopN Hash Memory Usage: 0.04 > value expressions: _col3 (type: bigint), _col4 (type: > bigint) > Execution mode: vectorized > Reducer 5 > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey1 (type: string), > KEY.reducesinkkey2 (type: string), KEY.reducesinkkey3 (type: int), > VALUE._col0 (type: bigint), VALUE._col1 (type: bigint), KEY.reducesinkkey0 > (type: bigint) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 > Statistics: Num rows: 44400 Data size: 2131200 Basic stats: > COMPLETE Column stats: COMPLETE > Limit > Number of rows: 100 > Statistics: Num rows: 100 Data size: 4800 Basic stats: > COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 100 Data size: 4800 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Execution mode: vectorized > Stage: Stage-0 > Fetch Operator > limit: 100 > Processor Tree: > ListSink > Time taken: 16.504 seconds, Fetched: 463 row(s) > {code} > Plan with modified query > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 7 > (BROADCAST_EDGE) > Map 7 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 > (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > DagName: mmokhtar_20150206180606_2f891814-bf41-4888-ab80-dddf4425205f:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: d1 > filterExpr: (d_date_sk is not null and d_week_seq is not > null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (d_date_sk is not null and d_week_seq is not > null) (type: boolean) > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int), d_week_seq (type: > int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int) > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 584392 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: inventory > Partition key expr: inv_date_sk > Statistics: Num rows: 73049 Data size: 584392 > Basic stats: COMPLETE Column stats: COMPLETE > Target column: inv_date_sk > Target Vertex: Map 2 > Map 10 > Map Operator Tree: > TableScan > alias: warehouse > filterExpr: w_warehouse_sk is not null (type: boolean) > Statistics: Num rows: 6 Data size: 6166 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: w_warehouse_sk is not null (type: boolean) > Statistics: Num rows: 6 Data size: 24 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: w_warehouse_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 6 Data size: 24 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 6 Data size: 24 Basic stats: > COMPLETE Column stats: COMPLETE > Map 2 > Map Operator Tree: > TableScan > alias: inventory > filterExpr: (inv_item_sk is not null and inv_warehouse_sk > is not null) (type: boolean) > Statistics: Num rows: 37584000 Data size: 443485104 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (inv_item_sk is not null and inv_warehouse_sk > is not null) (type: boolean) > Statistics: Num rows: 37584000 Data size: 593821104 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: inv_item_sk (type: int), inv_warehouse_sk > (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 37584000 Data size: 593821104 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col3 (type: int) > outputColumnNames: _col1, _col2, _col3, _col8, _col17 > input vertices: > 1 Map 7 > Statistics: Num rows: 937245073 Data size: > 14995921168 Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (_col2 < _col8) (type: boolean) > Statistics: Num rows: 312415024 Data size: > 4998640384 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col17 (type: int), _col1 (type: > int), _col3 (type: int) > outputColumnNames: _col13, _col18, _col20 > Statistics: Num rows: 312415024 Data size: > 2499320192 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int), _col1 (type: int) > 1 _col20 (type: int), _col13 (type: int) > outputColumnNames: _col20 > input vertices: > 0 Map 1 > Statistics: Num rows: 26560 Data size: 106240 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col20 (type: int) > 1 _col0 (type: int) > input vertices: > 1 Map 10 > Statistics: Num rows: 26560 Data size: 0 > Basic stats: PARTIAL Column stats: COMPLETE > Group By Operator > aggregations: count() > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 8 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col0 (type: bigint) > Map 4 > Map Operator Tree: > TableScan > alias: d1 > filterExpr: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 7158802 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int), d_date (type: > string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 73049 Data size: 7158802 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 73049 Data size: 7158802 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string) > Map 5 > Map Operator Tree: > TableScan > alias: item > filterExpr: i_item_sk is not null (type: boolean) > Statistics: Num rows: 48000 Data size: 68732712 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: i_item_sk is not null (type: boolean) > Statistics: Num rows: 48000 Data size: 192000 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 48000 Data size: 192000 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 48000 Data size: 192000 Basic > stats: COMPLETE Column stats: COMPLETE > Map 6 > Map Operator Tree: > TableScan > alias: customer_demographics > filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is > not null) (type: boolean) > Statistics: Num rows: 1920800 Data size: 718379200 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((cd_marital_status = 'M') and cd_demo_sk is > not null) (type: boolean) > Statistics: Num rows: 274400 Data size: 24421600 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cd_demo_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 274400 Data size: 1097600 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 274400 Data size: 1097600 Basic > stats: COMPLETE Column stats: COMPLETE > Map 7 > Map Operator Tree: > TableScan > alias: catalog_sales > filterExpr: (((cs_bill_hdemo_sk is not null and > cs_bill_cdemo_sk is not null) and cs_item_sk is not null) and cs_ship_date_sk > is not null) (type: boolean) > Statistics: Num rows: 286549727 Data size: 37743959324 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((cs_bill_hdemo_sk is not null and > cs_bill_cdemo_sk is not null) and cs_item_sk is not null) and cs_ship_date_sk > is not null) (type: boolean) > Statistics: Num rows: 284396955 Data size: 6814100200 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cs_ship_date_sk (type: int), > cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type: > int), cs_quantity (type: int), cs_sold_date_sk (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col5 > Statistics: Num rows: 284396955 Data size: 6814100200 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col5 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col7, _col8 > input vertices: > 1 Map 8 > Statistics: Num rows: 101715202 Data size: > 12002393836 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col3, _col4, > _col7, _col8 > input vertices: > 1 Map 9 > Statistics: Num rows: 20343040 Data size: > 2319106560 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col1 (type: int) > outputColumnNames: _col2, _col5, _col6, _col9, > _col10 > input vertices: > 0 Map 6 > Statistics: Num rows: 2906149 Data size: 34873788 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col10 (type: int), _col2 (type: > int), _col5 (type: int), _col6 (type: int), _col9 (type: string) > outputColumnNames: _col10, _col2, _col5, _col6, > _col9 > Statistics: Num rows: 2906149 Data size: > 34873788 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col5 (type: int) > outputColumnNames: _col3, _col6, _col7, > _col10, _col11 > input vertices: > 0 Map 5 > Statistics: Num rows: 2906149 Data size: > 34873788 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col10 (type: string), _col11 > (type: int), _col3 (type: int), _col6 (type: int), _col7 (type: int) > outputColumnNames: _col10, _col11, _col3, > _col6, _col7 > Statistics: Num rows: 2906149 Data size: > 34873788 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col3 (type: int) > outputColumnNames: _col1, _col8, _col9, > _col12, _col13 > input vertices: > 0 Map 4 > Statistics: Num rows: 2906149 Data size: > 296427198 Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (UDFToDouble(_col1) > > (UDFToDouble(_col12) + 5.0)) (type: boolean) > Statistics: Num rows: 968716 Data size: > 98809032 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col13 (type: int), > _col8 (type: int), _col9 (type: int) > outputColumnNames: _col13, _col3, > _col4 > Statistics: Num rows: 968716 Data > size: 7749728 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col3 (type: int) > sort order: + > Map-reduce partition columns: _col3 > (type: int) > Statistics: Num rows: 968716 Data > size: 7749728 Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col4 (type: > int), _col13 (type: int) > Map 8 > Map Operator Tree: > TableScan > alias: d1 > filterExpr: (((d_year = 2001) and d_date_sk is not null) > and d_week_seq is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((d_year = 2001) and d_date_sk is not null) > and d_week_seq is not null) (type: boolean) > Statistics: Num rows: 652 Data size: 69112 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int), d_date (type: > string), d_week_seq (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 652 Data size: 66504 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 652 Data size: 66504 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string), _col2 (type: > int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 652 Data size: 66504 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 652 Data size: 66504 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: catalog_sales > Partition key expr: cs_sold_date_sk > Statistics: Num rows: 652 Data size: 66504 Basic > stats: COMPLETE Column stats: COMPLETE > Target column: cs_sold_date_sk > Target Vertex: Map 7 > Map 9 > Map Operator Tree: > TableScan > alias: household_demographics > filterExpr: ((hd_buy_potential = '1001-5000') and > hd_demo_sk is not null) (type: boolean) > Statistics: Num rows: 7200 Data size: 770400 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((hd_buy_potential = '1001-5000') and > hd_demo_sk is not null) (type: boolean) > Statistics: Num rows: 1440 Data size: 138240 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: hd_demo_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1440 Data size: 5760 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1440 Data size: 5760 Basic > stats: COMPLETE Column stats: COMPLETE > Reducer 3 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)