Jesus Camacho Rodriguez created HIVE-14018:
----------------------------------------------
Summary: Make IN clause row selectivity estimation customizable
Key: HIVE-14018
URL: https://issues.apache.org/jira/browse/HIVE-14018
Project: Hive
Issue Type: Improvement
Components: Statistics
Affects Versions: 2.1.0, 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Minor
After HIVE-13287 went in, we calculate IN clause estimates natively (instead of
just dividing incoming number of rows by 2). However, as the distribution of
values of the columns is considered uniform, we might end up heavily
underestimating/overestimating the resulting number of rows.
This issue is to add a factor that multiplies the IN clause estimation so we
can alleviate this problem. The solution is not very elegant, but it is the
best we can do until we have histograms to improve our estimate.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)