[ https://issues.apache.org/jira/browse/HIVE-8526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192259#comment-14192259 ]
Mostafa Mokhtar commented on HIVE-8526: --------------------------------------- [~rhbutani] [~hagleitn] [~jpullokkaran] Issue resolved in latest build {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 8 (BROADCAST_EDGE) Map 6 <- Map 5 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) Map 9 <- Map 7 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 6 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) DagName: mmokhtar_20141031145858_4f12c0f7-13ef-46e8-9535-81b02cc8a937:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address filterExpr: ca_address_sk is not null (type: boolean) Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ca_address_sk is not null (type: boolean) Statistics: Num rows: 40000000 Data size: 7640000000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ca_address_sk (type: int), ca_county (type: string), ca_zip (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 40000000 Data size: 7640000000 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 {_col1} {_col2} keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col3, _col4 input vertices: 0 Map 8 Statistics: Num rows: 80000000 Data size: 14960000000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 80000000 Data size: 14960000000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col3 (type: string), _col4 (type: string) Execution mode: vectorized Map 5 Map Operator Tree: TableScan alias: date_dim filterExpr: (((d_qoy = 2) and (d_year = 2000)) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((d_qoy = 2) and (d_year = 2000)) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 635 Data size: 7620 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 635 Data size: 2540 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 635 Data size: 2540 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 635 Data size: 2540 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 317 Data size: 1268 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: web_sales Partition key expr: ws_sold_date_sk Statistics: Num rows: 317 Data size: 1268 Basic stats: COMPLETE Column stats: COMPLETE Target column: ws_sold_date_sk Target Vertex: Map 6 Execution mode: vectorized Map 6 Map Operator Tree: TableScan alias: web_sales filterExpr: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean) Statistics: Num rows: 21594638446 Data size: 2850189889652 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean) Statistics: Num rows: 21591944812 Data size: 345449542868 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_item_sk (type: int), ws_bill_customer_sk (type: int), ws_sales_price (type: float), ws_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 21591944812 Data size: 345449542868 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} 1 keys: 0 _col3 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2 input vertices: 1 Map 5 Statistics: Num rows: 24138881964 Data size: 289666583568 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} {_col2} 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col1, _col2 input vertices: 1 Map 9 Statistics: Num rows: 6034720256 Data size: 48277762048 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: int), _col2 (type: float) outputColumnNames: _col1, _col2 Statistics: Num rows: 6034720256 Data size: 48277762048 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 6034720256 Data size: 48277762048 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: float) Execution mode: vectorized Map 7 Map Operator Tree: TableScan alias: item filterExpr: ((i_item_sk) IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) and i_item_id is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((i_item_sk) IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) and i_item_id is not null) (type: boolean) Statistics: Num rows: 231000 Data size: 24024000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_id (type: string) outputColumnNames: _col0 Statistics: Num rows: 231000 Data size: 23100000 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 115500 Data size: 11550000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 115500 Data size: 11550000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 8 Map Operator Tree: TableScan alias: customer filterExpr: (c_current_addr_sk is not null and c_customer_sk is not null) (type: boolean) Statistics: Num rows: 80000000 Data size: 68801615852 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (c_current_addr_sk is not null and c_customer_sk is not null) (type: boolean) Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: c_customer_sk (type: int), c_current_addr_sk (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: int) Execution mode: vectorized Map 9 Map Operator Tree: TableScan alias: item filterExpr: (i_item_id is not null and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (i_item_id is not null and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 48048000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int), i_item_id (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 462000 Data size: 48048000 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Left Semi Join 0 to 1 condition expressions: 0 {_col0} 1 keys: 0 _col1 (type: string) 1 _col0 (type: string) outputColumnNames: _col0 input vertices: 1 Map 7 Statistics: Num rows: 343407 Data size: 1373628 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 343407 Data size: 1373628 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 343407 Data size: 1373628 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col2} {VALUE._col3} 1 {VALUE._col1} outputColumnNames: _col3, _col4, _col7 Statistics: Num rows: 137325482305 Data size: 26229167120255 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col4 (type: string), _col3 (type: string), _col7 (type: float) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 137325482305 Data size: 26229167120255 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: sum(_col2) keys: _col0 (type: string), _col1 (type: string) mode: hash outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 137325482305 Data size: 26778469049475 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) sort order: ++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string) Statistics: Num rows: 137325482305 Data size: 26778469049475 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: double) Reducer 3 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string) mode: mergepartial outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 137325482305 Data size: 27190445496390 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string), _col1 (type: string), _col2 (type: double) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 137325482305 Data size: 27190445496390 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) sort order: ++ Statistics: Num rows: 137325482305 Data size: 27190445496390 Basic stats: COMPLETE Column stats: COMPLETE TopN Hash Memory Usage: 0.04 value expressions: _col2 (type: double) Execution mode: vectorized Reducer 4 Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string), VALUE._col0 (type: double) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 137325482305 Data size: 27190445496390 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 100 Statistics: Num rows: 100 Data size: 19800 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 100 Data size: 19800 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Stage: Stage-0 Fetch Operator limit: 100 Processor Tree: ListSink {code} > Hive : CBO incorrect join order in TPC-DS Q45 as self join selectivity has > incorrect CE > --------------------------------------------------------------------------------------- > > Key: HIVE-8526 > URL: https://issues.apache.org/jira/browse/HIVE-8526 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Harish Butani > Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-8526.1.patch > > > The join order has Item joined last where it should be joined first > Query > {code} > select ca_zip, ca_county, sum(ws_sales_price) > from > web_sales > JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk > JOIN customer_address ON customer.c_current_addr_sk = > customer_address.ca_address_sk > JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk > JOIN item ON web_sales.ws_item_sk = item.i_item_sk > where > ( item.i_item_id in (select i_item_id > from item i2 > where i2.i_item_sk in (2, 3, 5, 7, 11, 13, 17, > 19, 23, 29) > ) > ) > and d_qoy = 2 and d_year = 2000 > group by ca_zip, ca_county > order by ca_zip, ca_county > limit 100 > {code} > Plan > {code} > 2014-10-20 18:43:16,521 DEBUG [main]: parse.SemanticAnalyzer > (SemanticAnalyzer.java:apply(12330)) - HiveSortRel(fetch=[100]): rowcount = > 1.710158597922807E7, cumulative cost = {7.169080587598123E10 rows, > 3.420317295845614E7 cpu, 0.0 io}, id = 579 > HiveSortRel(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]): rowcount = > 1.710158597922807E7, cumulative cost = {6.827294821015483E10 rows, > 1.710158697922807E7 cpu, 0.0 io}, id = 577 > HiveProjectRel(ca_zip=[$0], ca_county=[$1], _o__c2=[$2]): rowcount = > 1.710158597922807E7, cumulative cost = {6.485509054432843E10 rows, 1.0 cpu, > 0.0 io}, id = 575 > HiveAggregateRel(group=[{0, 1}], agg#0=[sum($2)]): rowcount = > 1.710158597922807E7, cumulative cost = {6.485509054432843E10 rows, 1.0 cpu, > 0.0 io}, id = 573 > HiveProjectRel($f0=[$2], $f1=[$1], $f2=[$0]): rowcount = > 6.0197670310147226E7, cumulative cost = {6.485509054432843E10 rows, 1.0 cpu, > 0.0 io}, id = 571 > HiveProjectRel(ws_sales_price=[$2], ca_county=[$7], ca_zip=[$8]): > rowcount = 6.0197670310147226E7, cumulative cost = {6.485509054432843E10 > rows, 1.0 cpu, 0.0 io}, id = 569 > HiveFilterRel(condition=[AND(=($11, 2), =($10, 2000))]): rowcount > = 6.0197670310147226E7, cumulative cost = {6.485509054432843E10 rows, 1.0 > cpu, 0.0 io}, id = 567 > SemiJoinRel(condition=[=($13, $14)], joinType=[inner]): > rowcount = 3.371069537368245E10, cumulative cost = {6.485509054432843E10 > rows, 1.0 cpu, 0.0 io}, id = 565 > HiveProjectRel(ws_item_sk=[$0], ws_bill_customer_sk=[$1], > ws_sales_price=[$2], ws_sold_date_sk=[$3], c_customer_sk=[$9], > c_current_addr_sk=[$10], ca_address_sk=[$11], ca_county=[$12], ca_zip=[$13], > d_date_sk=[$6], d_year=[$7], d_qoy=[$8], i_item_sk=[$4], i_item_id=[$5]): > rowcount = 3.371069537368245E10, cumulative cost = {6.485509054332843E10 > rows, 0.0 cpu, 0.0 io}, id = 669 > HiveJoinRel(condition=[=($1, $9)], joinType=[inner]): > rowcount = 3.371069537368245E10, cumulative cost = {6.485509054332843E10 > rows, 0.0 cpu, 0.0 io}, id = 667 > HiveJoinRel(condition=[=($3, $6)], joinType=[inner]): > rowcount = 2.1594638446E10, cumulative cost = {4.3189811941E10 rows, 0.0 cpu, > 0.0 io}, id = 664 > HiveJoinRel(condition=[=($0, $4)], joinType=[inner]): > rowcount = 2.1594638446E10, cumulative cost = {2.1595100446E10 rows, 0.0 cpu, > 0.0 io}, id = 601 > HiveProjectRel(ws_item_sk=[$2], > ws_bill_customer_sk=[$3], ws_sales_price=[$20], ws_sold_date_sk=[$33]): > rowcount = 2.1594638446E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id > = 497 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.web_sales]]): > rowcount = 2.1594638446E10, cumulative cost = {0}, id = 341 > HiveProjectRel(i_item_sk=[$0], i_item_id=[$1]): > rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 555 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 340 > HiveProjectRel(d_date_sk=[$0], d_year=[$6], > d_qoy=[$10]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 > io}, id = 551 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): > rowcount = 73049.0, cumulative cost = {0}, id = 342 > HiveJoinRel(condition=[=($1, $2)], joinType=[inner]): > rowcount = 7.064015632843196E7, cumulative cost = {1.2E8 rows, 0.0 cpu, 0.0 > io}, id = 598 > HiveProjectRel(c_customer_sk=[$0], > c_current_addr_sk=[$4]): rowcount = 8.0E7, cumulative cost = {0.0 rows, 0.0 > cpu, 0.0 io}, id = 500 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer]]): > rowcount = 8.0E7, cumulative cost = {0}, id = 343 > HiveProjectRel(ca_address_sk=[$0], ca_county=[$7], > ca_zip=[$9]): rowcount = 4.0E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 > io}, id = 547 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_address]]): > rowcount = 4.0E7, cumulative cost = {0}, id = 339 > HiveProjectRel(i_item_id=[$1]): rowcount = 1.05119214745814, > cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 563 > HiveProjectRel(i_item_sk=[$0], i_item_id=[$1]): rowcount = > 1.05119214745814, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 561 > HiveFilterRel(condition=[in($0, 2, 3, 5, 7, 11, 13, 17, > 19, 23, 29)]): rowcount = 1.05119214745814, cumulative cost = {0.0 rows, 0.0 > cpu, 0.0 io}, id = 559 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 340 > {code} > Then I rewrote the query trying to force CBO to generate the correct join > order > {code} > with items as (select i_item_sk from > item where > ( item.i_item_id in (select i_item_id > from item i2 > where i2.i_item_sk in (2, 3, 5, 7, 11, 13, 17, > 19, 23, 29) > ) > ) > ) > select ca_zip, ca_county, sum(ws_sales_price) > from > web_sales > JOIN items ON web_sales.ws_item_sk = items.i_item_sk > JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk > JOIN customer_address ON customer.c_current_addr_sk = > customer_address.ca_address_sk > JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk > where > d_qoy = 2 and d_year = 2000 > group by ca_zip, ca_county > order by ca_zip, ca_county > limit 100 > {code} > But the correct join order wasn't generated because CE for item x item + > filter has a selectivity of 1. > {code} > 2014-10-20 18:46:27,120 DEBUG [main]: parse.SemanticAnalyzer > (SemanticAnalyzer.java:apply(12330)) - HiveSortRel(fetch=[100]): rowcount = > 1.6595391288544238E7, cumulative cost = {2.8364280421639153E10 rows, > 3.3190782577088475E7 cpu, 0.0 io}, id = 1291 > HiveSortRel(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]): rowcount = > 1.6595391288544238E7, cumulative cost = {2.505357243157397E10 rows, > 1.6595391288544238E7 cpu, 0.0 io}, id = 1289 > HiveProjectRel(ca_zip=[$0], ca_county=[$1], _o__c2=[$2]): rowcount = > 1.6595391288544238E7, cumulative cost = {2.174286444150879E10 rows, 0.0 cpu, > 0.0 io}, id = 1287 > HiveAggregateRel(group=[{0, 1}], agg#0=[sum($2)]): rowcount = > 1.6595391288544238E7, cumulative cost = {2.174286444150879E10 rows, 0.0 cpu, > 0.0 io}, id = 1285 > HiveProjectRel($f0=[$9], $f1=[$8], $f2=[$2]): rowcount = > 6.019767031014723E7, cumulative cost = {2.174286444150879E10 rows, 0.0 cpu, > 0.0 io}, id = 1283 > HiveProjectRel(ws_item_sk=[$5], ws_bill_customer_sk=[$6], > ws_sales_price=[$7], ws_sold_date_sk=[$8], i_item_sk=[$12], > c_customer_sk=[$0], c_current_addr_sk=[$1], ca_address_sk=[$2], > ca_county=[$3], ca_zip=[$4], d_date_sk=[$9], d_year=[$10], d_qoy=[$11]): > rowcount = 6.019767031014723E7, cumulative cost = {2.174286444150879E10 rows, > 0.0 cpu, 0.0 io}, id = 1380 > HiveJoinRel(condition=[=($6, $0)], joinType=[inner]): rowcount = > 6.019767031014723E7, cumulative cost = {2.174286444150879E10 rows, 0.0 cpu, > 0.0 io}, id = 1378 > HiveJoinRel(condition=[=($1, $2)], joinType=[inner]): rowcount > = 7.064015632843196E7, cumulative cost = {1.2E8 rows, 0.0 cpu, 0.0 io}, id = > 1309 > HiveProjectRel(c_customer_sk=[$0], c_current_addr_sk=[$4]): > rowcount = 8.0E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1269 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer]]): > rowcount = 8.0E7, cumulative cost = {0}, id = 1035 > HiveProjectRel(ca_address_sk=[$0], ca_county=[$7], > ca_zip=[$9]): rowcount = 4.0E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 > io}, id = 1273 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_address]]): > rowcount = 4.0E7, cumulative cost = {0}, id = 1032 > HiveJoinRel(condition=[=($0, $7)], joinType=[inner]): rowcount > = 3.856185436785714E7, cumulative cost = {2.16336624308125E10 rows, 0.0 cpu, > 0.0 io}, id = 1376 > HiveJoinRel(condition=[=($3, $4)], joinType=[inner]): > rowcount = 3.856185436785714E7, cumulative cost = {2.159463857644464E10 rows, > 0.0 cpu, 0.0 io}, id = 1316 > HiveProjectRel(ws_item_sk=[$2], ws_bill_customer_sk=[$3], > ws_sales_price=[$20], ws_sold_date_sk=[$33]): rowcount = 2.1594638446E10, > cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1205 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.web_sales]]): > rowcount = 2.1594638446E10, cumulative cost = {0}, id = 1033 > HiveProjectRel(d_date_sk=[$0], d_year=[$6], d_qoy=[$10]): > rowcount = 130.44464285714287, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, > id = 1279 > HiveFilterRel(condition=[AND(=($10, 2), =($6, 2000))]): > rowcount = 130.44464285714287, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, > id = 1277 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): > rowcount = 73049.0, cumulative cost = {0}, id = 1034 > HiveProjectRel(i_item_sk=[$0]): rowcount = 462000.0, > cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 1265 > HiveFilterRel(condition=[=(1, 1)]): rowcount = 462000.0, > cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 1263 > SemiJoinRel(condition=[=($1, $2)], joinType=[inner]): > rowcount = 462000.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 1261 > HiveProjectRel(i_item_sk=[$0], i_item_id=[$1]): > rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1253 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 1024 > HiveProjectRel(i_item_id=[$1]): rowcount = > 1.05119214745814, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1259 > HiveProjectRel(i_item_sk=[$0], i_item_id=[$1]): > rowcount = 1.05119214745814, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, > id = 1257 > HiveFilterRel(condition=[in($0, 2, 3, 5, 7, 11, 13, > 17, 19, 23, 29)]): rowcount = 1.05119214745814, cumulative cost = {0.0 rows, > 0.0 cpu, 0.0 io}, id = 1255 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 1024 > {code} > This query generates the correct join order > {code} > with items as (select i_item_sk from > item where > item.i_item_id in (select i_item_id > from item i2 > where i2.i_item_sk in (2, 3, 5, 7, 11, 13, 17, > 19, 23, 29) > ) > > ), > ws as ( > select ws_bill_customer_sk,ws_sales_price,ws_sold_date_sk > from web_sales > JOIN items ON web_sales.ws_item_sk = items.i_item_sk > ) > select ca_zip, ca_county, sum(ws_sales_price) > from ws > JOIN customer ON ws.ws_bill_customer_sk = customer.c_customer_sk > JOIN customer_address ON customer.c_current_addr_sk = > customer_address.ca_address_sk > JOIN date_dim ON ws.ws_sold_date_sk = date_dim.d_date_sk > where d_qoy = 2 and d_year = 2000 > group by ca_zip, ca_county > order by ca_zip, ca_county > limit 100 > {code} > Plan > {code} > 2014-10-20 19:13:15,989 DEBUG [main]: parse.SemanticAnalyzer > (SemanticAnalyzer.java:apply(12330)) - HiveSortRel(fetch=[100]): rowcount = > 1.6595391288544238E7, cumulative cost = {4.99203570142713E10 rows, > 3.3190783577088475E7 cpu, 0.0 io}, id = 4367 > HiveSortRel(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]): rowcount = > 1.6595391288544238E7, cumulative cost = {4.6609649024206116E10 rows, > 1.6595392288544238E7 cpu, 0.0 io}, id = 4365 > HiveProjectRel(ca_zip=[$0], ca_county=[$1], _o__c2=[$2]): rowcount = > 1.6595391288544238E7, cumulative cost = {4.329894103414093E10 rows, 1.0 cpu, > 0.0 io}, id = 4363 > HiveAggregateRel(group=[{0, 1}], agg#0=[sum($2)]): rowcount = > 1.6595391288544238E7, cumulative cost = {4.329894103414093E10 rows, 1.0 cpu, > 0.0 io}, id = 4361 > HiveProjectRel($f0=[$7], $f1=[$6], $f2=[$1]): rowcount = > 6.019767031014723E7, cumulative cost = {4.329894103414093E10 rows, 1.0 cpu, > 0.0 io}, id = 4359 > HiveProjectRel(ws_bill_customer_sk=[$5], ws_sales_price=[$6], > ws_sold_date_sk=[$7], c_customer_sk=[$0], c_current_addr_sk=[$1], > ca_address_sk=[$2], ca_county=[$3], ca_zip=[$4], d_date_sk=[$8], d_year=[$9], > d_qoy=[$10]): rowcount = 6.019767031014723E7, cumulative cost = > {4.329894103414093E10 rows, 1.0 cpu, 0.0 io}, id = 4426 > HiveJoinRel(condition=[=($5, $0)], joinType=[inner]): rowcount = > 6.019767031014723E7, cumulative cost = {4.329894103414093E10 rows, 1.0 cpu, > 0.0 io}, id = 4424 > HiveJoinRel(condition=[=($1, $2)], joinType=[inner]): rowcount > = 7.064015632843196E7, cumulative cost = {1.2E8 rows, 0.0 cpu, 0.0 io}, id = > 4392 > HiveProjectRel(c_customer_sk=[$0], c_current_addr_sk=[$4]): > rowcount = 8.0E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4345 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer]]): > rowcount = 8.0E7, cumulative cost = {0}, id = 4101 > HiveProjectRel(ca_address_sk=[$0], ca_county=[$7], > ca_zip=[$9]): rowcount = 4.0E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 > io}, id = 4349 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_address]]): > rowcount = 4.0E7, cumulative cost = {0}, id = 4099 > HiveJoinRel(condition=[=($2, $3)], joinType=[inner]): rowcount > = 3.856185436785714E7, cumulative cost = {4.318973902344464E10 rows, 1.0 cpu, > 0.0 io}, id = 4395 > HiveProjectRel(ws_bill_customer_sk=[$1], ws_sales_price=[$2], > ws_sold_date_sk=[$3]): rowcount = 2.1594638446E10, cumulative cost = > {2.1595100447E10 rows, 1.0 cpu, 0.0 io}, id = 4343 > HiveProjectRel(ws_item_sk=[$0], ws_bill_customer_sk=[$1], > ws_sales_price=[$2], ws_sold_date_sk=[$3], i_item_sk=[$4]): rowcount = > 2.1594638446E10, cumulative cost = {2.1595100447E10 rows, 1.0 cpu, 0.0 io}, > id = 4388 > HiveJoinRel(condition=[=($0, $4)], joinType=[inner]): > rowcount = 2.1594638446E10, cumulative cost = {2.1595100447E10 rows, 1.0 cpu, > 0.0 io}, id = 4383 > HiveProjectRel(ws_item_sk=[$2], > ws_bill_customer_sk=[$3], ws_sales_price=[$20], ws_sold_date_sk=[$33]): > rowcount = 2.1594638446E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id > = 4277 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.web_sales]]): > rowcount = 2.1594638446E10, cumulative cost = {0}, id = 4096 > HiveProjectRel(i_item_sk=[$0]): rowcount = 462000.0, > cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 4339 > HiveFilterRel(condition=[=(1, 1)]): rowcount = > 462000.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 4337 > SemiJoinRel(condition=[=($1, $2)], > joinType=[inner]): rowcount = 462000.0, cumulative cost = {1.0 rows, 1.0 cpu, > 0.0 io}, id = 4335 > HiveProjectRel(i_item_sk=[$0], i_item_id=[$1]): > rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4327 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 4088 > HiveProjectRel(i_item_id=[$1]): rowcount = > 1.05119214745814, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4333 > HiveProjectRel(i_item_sk=[$0], i_item_id=[$1]): > rowcount = 1.05119214745814, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, > id = 4331 > HiveFilterRel(condition=[in($0, 2, 3, 5, 7, > 11, 13, 17, 19, 23, 29)]): rowcount = 1.05119214745814, cumulative cost = > {0.0 rows, 0.0 cpu, 0.0 io}, id = 4329 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 4088 > HiveProjectRel(d_date_sk=[$0], d_year=[$6], d_qoy=[$10]): > rowcount = 130.44464285714287, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, > id = 4355 > HiveFilterRel(condition=[AND(=($10, 2), =($6, 2000))]): > rowcount = 130.44464285714287, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, > id = 4353 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): > rowcount = 73049.0, cumulative cost = {0}, id = 4100 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)