Konstantin Bereznyakov created HIVE-29503:
---------------------------------------------

             Summary: StatsRulesProcFactory: Join cardinality estimation 
explodes to cross product when column NDV is unknown (0)
                 Key: HIVE-29503
                 URL: https://issues.apache.org/jira/browse/HIVE-29503
             Project: Hive
          Issue Type: Bug
            Reporter: Konstantin Bereznyakov
         Attachments: ndv_zero_join_selectivity.q, 
ndv_zero_join_selectivity.q.out.q

The attached file demonstrates the explosion in the number of records when a 
self-join is applied to a table with 100M rows, resulting in 10 Quadrillion 
records. On this scale it could be hard to maintain an accurate estimate of the 
true # of unique values thus a "0" used for "unknown" could be used and is 
already expected and handled in multiple estimation places.
[^ndv_zero_join_selectivity.q]
current output: [^ndv_zero_join_selectivity.q.out.q] (had to modify extension 
or it would not attach)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to