[ 
https://issues.apache.org/jira/browse/HIVE-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874286#comment-16874286
 ] 

Jesus Camacho Rodriguez commented on HIVE-21928:
------------------------------------------------

[~kgyrtkirk], I do not think the patch I uploaded provides the right solution, 
I just put it together to run some tests.
Basically, with this patch, we will be scaling the ndv for top level AND as we 
were doing before HIVE-20260 went in, which is not what we want either. And I 
still think there may be some issues with scaling of ndv in the presence of 
nested ANDs (basically, we would be skipping the scaling of some of the 
columns).
We could revert HIVE-20260 for the time being, but that will cause some 
regressions too.
A proposal to fix this is to rewrite the scaling logic to compute a reduction 
ratio per column instead of global.

> Fix for statistics annotation in nested AND expressions
> -------------------------------------------------------
>
>                 Key: HIVE-21928
>                 URL: https://issues.apache.org/jira/browse/HIVE-21928
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>         Attachments: HIVE-21928.patch
>
>
> Discovered while working on HIVE-21867. Having predicates with nested AND 
> expressions may result in different stats, even if predicates are basically 
> similar (from stats estimation standpoint).
> For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to