[
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141440#comment-14141440
]
Mostafa Mokhtar commented on HIVE-8196:
---------------------------------------
[~hagleitn]
> Joining on partition columns with fetch column stats enabled results it very
> small CE which negatively affects query performance
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-8196
> URL: https://issues.apache.org/jira/browse/HIVE-8196
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Prasanth J
> Priority: Critical
> Labels: performance
> Fix For: 0.14.0
>
>
> To make the best out of dynamic partition pruning joins should be on the
> partitioning columns which results in dynamically pruning the partitions from
> the fact table based on the qualifying column keys from the dimension table,
> this type of joins negatively effects on cardinality estimates with fetch
> column stats enabled.
> Currently we don't have statistics for partition columns and as a result NDV
> is set to row count, doing that negatively affects the estimated join
> selectivity from the join.
> Workaround is to capture statistics for partition columns or use number of
> partitions incase dynamic partitioning is used.
> In StatsUtils.getColStatisticsFromExpression is where count distincts gets
> set to row count
> {code}
> if (encd.getIsPartitionColOrVirtualCol()) {
> // vitual columns
> colType = encd.getTypeInfo().getTypeName();
> countDistincts = numRows;
> oi = encd.getWritableObjectInspector();
> {code}
> Query used to repro the issue :
> {code}
> set hive.stats.fetch.column.stats=ture;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date
> from store_sales, date_dim
> where
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and
> date_dim.d_year = 1998;
> {code}
> Plan
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)