[
https://issues.apache.org/jira/browse/HIVE-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953387#comment-14953387
]
Pengcheng Xiong commented on HIVE-12094:
----------------------------------------
[~gopalv], yes, the stats estimation for aggregation is very rough right now.
> nDV of aggregate columns tend to be log scale - not unique
> ----------------------------------------------------------
>
> Key: HIVE-12094
> URL: https://issues.apache.org/jira/browse/HIVE-12094
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Affects Versions: 1.3.0, 2.0.0
> Reporter: Gopal V
>
> Stats for aggregate columns do not process properly if declared as a simple
> nDV
> {code}
> select count(distinct l_suppkey) from lineitem group by l_orderkey having
> count(distinct l_suppkey) = 1
> {code}
> will mis-estimate the cardinality of the output by a significant margin.
> The log-scale of the nDV in general skews towards a very low number, which is
> not accounted for in the StatsRulesProcFactory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)