[ 
https://issues.apache.org/jira/browse/HIVE-29503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-29503 started by Konstantin Bereznyakov.
-----------------------------------------------------
> StatsRulesProcFactory: Join cardinality estimation explodes to cross product 
> when column NDV is unknown (0)
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29503
>                 URL: https://issues.apache.org/jira/browse/HIVE-29503
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Konstantin Bereznyakov
>            Assignee: Konstantin Bereznyakov
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: ndv_zero_join_selectivity.q, 
> ndv_zero_join_selectivity.q.out.q
>
>
> The attached file demonstrates the explosion in the number of records when a 
> self-join is applied to a table with 100M rows, resulting in 10 Quadrillion 
> records. On this scale it could be hard to maintain an accurate estimate of 
> the true # of unique values thus a "0" used for "unknown" could be used and 
> is already expected and handled in multiple estimation places.
> [^ndv_zero_join_selectivity.q]
> current output: [^ndv_zero_join_selectivity.q.out.q] (had to modify extension 
> or it would not attach)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to