[ https://issues.apache.org/jira/browse/HIVE-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873888#comment-16873888 ]
Jesus Camacho Rodriguez commented on HIVE-21928: ------------------------------------------------ This patch does not solve the problem, but it will help to know which tests may be affected by this (I suspect many). This seems to be broken since HIVE-20260. {{StatsRulesProcFactory.process}} calls {{evaluateExpression}}. If the node is an {{AND}}, then we iterate through its children and compute the new row count. Next, we call {{updateStats}}, but we do it over a clone of the input stats. This basically means that the method is not modifying the actual stats (this was expected in previous logic). When we obtain the final row count for the predicate, it is the responsibility of {{process}} to update the stats accordingly and do other operations such as scaling the ndv for the different columns. HIVE-20260 was trying to use different scaling ratios for each column, but it ends up scaling only the column in last node of the AND clause (rest are ignored). Also, I think logic may have problems for different levels of nesting and scaling of columns, maybe not updating their ndv at all depending on the complexity of the predicate. This logic needs to be revisited. [~kgyrtkirk], could you take a look at this since you introduced this logic? I think it is important that we get this right as ndv has huge impact on different steps of our planning phase. > Fix for statistics annotation in nested AND expressions > ------------------------------------------------------- > > Key: HIVE-21928 > URL: https://issues.apache.org/jira/browse/HIVE-21928 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Attachments: HIVE-21928.patch > > > Discovered while working on HIVE-21867. Having predicates with nested AND > expressions may result in different stats, even if predicates are basically > similar (from stats estimation standpoint). > For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)