[
https://issues.apache.org/jira/browse/HIVE-29503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Bereznyakov reassigned HIVE-29503:
---------------------------------------------
Assignee: Konstantin Bereznyakov
> StatsRulesProcFactory: Join cardinality estimation explodes to cross product
> when column NDV is unknown (0)
> -----------------------------------------------------------------------------------------------------------
>
> Key: HIVE-29503
> URL: https://issues.apache.org/jira/browse/HIVE-29503
> Project: Hive
> Issue Type: Bug
> Reporter: Konstantin Bereznyakov
> Assignee: Konstantin Bereznyakov
> Priority: Major
> Labels: pull-request-available
> Attachments: ndv_zero_join_selectivity.q,
> ndv_zero_join_selectivity.q.out.q
>
>
> The attached file demonstrates the explosion in the number of records when a
> self-join is applied to a table with 100M rows, resulting in 10 Quadrillion
> records. On this scale it could be hard to maintain an accurate estimate of
> the true # of unique values thus a "0" used for "unknown" could be used and
> is already expected and handled in multiple estimation places.
> [^ndv_zero_join_selectivity.q]
> current output: [^ndv_zero_join_selectivity.q.out.q] (had to modify extension
> or it would not attach)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)