Konstantin Bereznyakov created HIVE-29503:
---------------------------------------------
Summary: StatsRulesProcFactory: Join cardinality estimation
explodes to cross product when column NDV is unknown (0)
Key: HIVE-29503
URL: https://issues.apache.org/jira/browse/HIVE-29503
Project: Hive
Issue Type: Bug
Reporter: Konstantin Bereznyakov
Attachments: ndv_zero_join_selectivity.q,
ndv_zero_join_selectivity.q.out.q
The attached file demonstrates the explosion in the number of records when a
self-join is applied to a table with 100M rows, resulting in 10 Quadrillion
records. On this scale it could be hard to maintain an accurate estimate of the
true # of unique values thus a "0" used for "unknown" could be used and is
already expected and handled in multiple estimation places.
[^ndv_zero_join_selectivity.q]
current output: [^ndv_zero_join_selectivity.q.out.q] (had to modify extension
or it would not attach)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)