Thomas Rebele created HIVE-29366:
------------------------------------

             Summary: Use histogram statistics to improve the estimates of the 
equality predicates
                 Key: HIVE-29366
                 URL: https://issues.apache.org/jira/browse/HIVE-29366
             Project: Hive
          Issue Type: Improvement
            Reporter: Thomas Rebele


The histogram statistics could be used to improve the estimates for equality 
predicates.

Example on a metastore dump of a TPC-DS 30TB cluster with histograms (tried on 
a commit of 2025-12-04, ca105f8124072d19d88a83b2ced613d326c9a26b):
{code:java}
explain cbo joincost select count(*) from item where i_current_price = 0.09;
explain cbo joincost select count(*) from item where i_current_price = 
99.99;{code}
Both estimate the filter to have a {{{}rowcount = 49.196038760515385{}}}. This 
is the same without histograms.

The histogram statistics estimate {{i_current_price >= 99.99}} to have a 
{{rowcount = 1.0000000000136329}} and {{i_current_price <= 0.09}} to have a 
{{{}rowcount = 2592.0{}}}. The latter rowcount estimate could be improved 
though (see HIVE-29365), as {{i_current_price <= 0.12}} leads to the same 
estimate {{{}rowcount = 2592.0{}}}. Calculating 2592 / 4 = 648 (because there 
are 4 numbers, 0.09, 0.10, 0.11, 0.12), we get an estimate quite close to the 
real value 587.

The ground truth:
{code:java}
trino:sf30000> select count(*) from item where i_current_price = 0.09;
 _col0 
-------
   587

trino:sf30000> select count(*) from item where i_current_price = 99.99;
 _col0 
-------
     3 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to