[ https://issues.apache.org/jira/browse/HIVE-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951338#comment-16951338 ]
Jesus Camacho Rodriguez commented on HIVE-22239: ------------------------------------------------ [~kgyrtkirk], [~mgergely], please let me know if there is anything else that should be addressed within the scope of this patch. Thanks > Scale data size using column value ranges > ----------------------------------------- > > Key: HIVE-22239 > URL: https://issues.apache.org/jira/browse/HIVE-22239 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Major > Labels: pull-request-available > Attachments: HIVE-22239.01.patch, HIVE-22239.02.patch, > HIVE-22239.03.patch, HIVE-22239.04.patch, HIVE-22239.04.patch, > HIVE-22239.05.patch, HIVE-22239.05.patch, HIVE-22239.06.patch, > HIVE-22239.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Currently, min/max values for columns are only used to determine whether a > certain range filter falls out of range and thus filters all rows or none at > all. If it does not, we just use a heuristic that the condition will filter > 1/3 of the input rows. Instead of using that heuristic, we can use another > one that assumes that data will be uniformly distributed across that range, > and calculate the selectivity for the condition accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005)