[ https://issues.apache.org/jira/browse/HIVE-18804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397135#comment-16397135 ]
Zoltan Haindrich commented on HIVE-18804: ----------------------------------------- [~ashutoshc] it's the same column; but it turned out that the name is different; for example at [constantpropagate|https://github.com/apache/hive/blob/12041d39f052dc8e4858815da15c967cb378fae9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java#L1223] the columnExprMap is extended with the '_col' aliased expression - this eventually starts multiplicating the stats datasize [here|https://github.com/apache/hive/blob/12041d39f052dc8e4858815da15c967cb378fae9/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1524] It would be better to fix this the other way around; by removing these additions of "_col" to the exprmap? > StatsUtils.getColStatisticsFromExprMap may only provide info for a column once > ------------------------------------------------------------------------------ > > Key: HIVE-18804 > URL: https://issues.apache.org/jira/browse/HIVE-18804 > Project: Hive > Issue Type: Sub-task > Components: Statistics > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Major > Attachments: HIVE-18804.01.patch, HIVE-18804.02.patch > > > currently {{StatsUtils.getColStatisticsFromExprMap}} may duplicate the > datasize by passing the info about the same column more than once > https://github.com/apache/hive/blob/e8e5ab24616aa834f4966efe3a5f437f6bee4d1d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1529 -- This message was sent by Atlassian JIRA (v7.6.3#76005)