[ 
https://issues.apache.org/jira/browse/HIVE-18804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397135#comment-16397135
 ] 

Zoltan Haindrich commented on HIVE-18804:
-----------------------------------------

[~ashutoshc] it's the same column; but it turned out that the name is 
different; for example at 
[constantpropagate|https://github.com/apache/hive/blob/12041d39f052dc8e4858815da15c967cb378fae9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java#L1223]
 the columnExprMap is extended with the '_col' aliased expression - this 
eventually starts multiplicating the stats datasize 
[here|https://github.com/apache/hive/blob/12041d39f052dc8e4858815da15c967cb378fae9/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1524]
It would be better to fix this the other way around; by removing these 
additions of "_col" to the exprmap?

> StatsUtils.getColStatisticsFromExprMap may only provide info for a column once
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-18804
>                 URL: https://issues.apache.org/jira/browse/HIVE-18804
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Statistics
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-18804.01.patch, HIVE-18804.02.patch
>
>
> currently {{StatsUtils.getColStatisticsFromExprMap}} may duplicate the 
> datasize by passing the info about the same column more than once
> https://github.com/apache/hive/blob/e8e5ab24616aa834f4966efe3a5f437f6bee4d1d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1529



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to