[ 
https://issues.apache.org/jira/browse/HIVE-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873888#comment-16873888
 ] 

Jesus Camacho Rodriguez commented on HIVE-21928:
------------------------------------------------

This patch does not solve the problem, but it will help to know which tests may 
be affected by this (I suspect many). This seems to be broken since HIVE-20260. 
{{StatsRulesProcFactory.process}} calls {{evaluateExpression}}. If the node is 
an {{AND}}, then we iterate through its children and compute the new row count. 
Next, we call {{updateStats}}, but we do it over a clone of the input stats. 
This basically means that the method is not modifying the actual stats (this 
was expected in previous logic). When we obtain the final row count for the 
predicate, it is the responsibility of {{process}} to update the stats 
accordingly and do other operations such as scaling the ndv for the different 
columns. HIVE-20260 was trying to use different scaling ratios for each column, 
but it ends up scaling only the column in last node of the AND clause (rest are 
ignored). Also, I think logic may have problems for different levels of nesting 
and scaling of columns, maybe not updating their ndv at all depending on the 
complexity of the predicate. This logic needs to be revisited. [~kgyrtkirk], 
could you take a look at this since you introduced this logic? I think it is 
important that we get this right as ndv has huge impact on different steps of 
our planning phase.

> Fix for statistics annotation in nested AND expressions
> -------------------------------------------------------
>
>                 Key: HIVE-21928
>                 URL: https://issues.apache.org/jira/browse/HIVE-21928
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>         Attachments: HIVE-21928.patch
>
>
> Discovered while working on HIVE-21867. Having predicates with nested AND 
> expressions may result in different stats, even if predicates are basically 
> similar (from stats estimation standpoint).
> For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to