[ 
https://issues.apache.org/jira/browse/HIVE-20260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-20260:
------------------------------------
    Attachment: HIVE-20260.01.patch

> NDV of a column shouldn't be scaled when row count is changed by filter on 
> another column
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-20260
>                 URL: https://issues.apache.org/jira/browse/HIVE-20260
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-20260.01.patch, HIVE-20260.01wip01.patch, 
> HIVE-20260.01wip02.patch, HIVE-20260.01wip03.patch
>
>
> HIVE-17465 introduced progressive scaling of rowcounts in presence of 
> multiple filters. HIVE-19500 improved on that by also scaling col stats (NDV) 
> in such scenario. However, it should pay attention to column used in filter 
> expression and not scale for all filters. eg.,
> consider filter a = 1 and b = 2 ndv of column b should not be scaled down by 
> row count changes caused by a = 1
> Other way to say this that ndv of a particular column should be updated at 
> the end of computation of row count for that operator.
> Here are the possible cases where our estimates can be accurate (or close to)
> {code}
> case 1 - (d_year = 2001 and d_moy=1)
> case 2 - (d_year = 2001 and d_year IN (2001, 2002))
> case 3 - (d_year = 2001 and d_moy = 1 and d_dom = 1)
> case 4 - (d_date IN ('1999-01-02', '1999-01-02'))
> case 5 - (d_date = '1999-01-01')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to