Rajesh Balamohan created HIVE-23788:

             Summary: FilterStatsRule misestimate causes hashtable computation 
to rehash often
                 Key: HIVE-23788
                 URL: https://issues.apache.org/jira/browse/HIVE-23788
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan

Depending on available statistics, FilterStatsRule estimates the rows as 
numRows/3 at times. This causes, lower keyCount to be projected for hashtable 
computation causing rehashing often.



E.g TPCDS Q74 @ 10TB. But as part of evaluating "t_s_firstyear.year_total > 0, 
t_w_secyear.year_total / t_w_firstyear.year_total , t_s_secyear.year_total / 
t_s_firstyear.year_total " conditions, it projects 1/3rd of the rows causing 
rehashing of hashtable in downstream vertex.

May have to check whether stats can be projected for these columns correctly.


This message was sent by Atlassian Jira

Reply via email to