[
https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535756#comment-14535756
]
Sushanth Sowmyan commented on HIVE-10412:
-----------------------------------------
Removing fix version of 1.2.0 in preparation of release, since this is not a
blocker for 1.2.0.
> CBO : Calculate join selectivity when computing HiveJoin cost
> -------------------------------------------------------------
>
> Key: HIVE-10412
> URL: https://issues.apache.org/jira/browse/HIVE-10412
> Project: Hive
> Issue Type: Sub-task
> Components: CBO
> Reporter: Mostafa Mokhtar
> Assignee: Laljo John Pullokkaran
>
> This is from TPC-DS Q7
> Because we don't compute the selectivity of sub-expression in a HiveJoin we
> assume that selective and non-selective joins have the similar cost.
> {code}
> select i_item_id,
> avg(ss_quantity) agg1,
> avg(ss_list_price) agg2,
> avg(ss_coupon_amt) agg3,
> avg(ss_sales_price) agg4
> from store_sales, customer_demographics, item
> where store_sales.ss_item_sk = item.i_item_sk and
> store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
> cd_gender = 'F' and
> cd_marital_status = 'W' and
> cd_education_status = 'Primary'
> group by i_item_id
> order by i_item_id
> limit 100
> {code}
> Cardinality
> {code}
> item 462,000
> customer_demographics 1,920,800
> store_sales 82,510,879,939
> {code}
> NDVs
> {code}
> item.i_item_sk 439501
> customer_demographics.cd_demo_sk 1835839
> store_sales.ss_cdemo_sk 1835839
> {code}
> From the logs
> {code}
> 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
> HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none],
> cost=[not available])
> HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin],
> cost=[{8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666
> io}])
> HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9],
> ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
> HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2],
> cd_education_status=[$3])
> HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
>
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
> HiveProject(i_item_sk=[$0], i_item_id=[$1])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
> 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8
> rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
> 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8
> rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}
> 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(78)) - MapJoin selected
> 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
> HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not
> available])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin],
> cost=[{8.2511341939E10 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}])
> HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9],
> ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
> HiveProject(i_item_sk=[$0], i_item_id=[$1])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
> HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2],
> cd_education_status=[$3])
> HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
>
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
> 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.251089518344444E10
> rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
> 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel
> (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.251089518344444E10
> rows, 2.324083308641975E8 cpu, 275417.5666666666 io}
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)