[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878774#action_12878774 ]
Ning Zhang commented on HIVE-1139: ---------------------------------- Soundararajan, thanks for the contribution. However it seems the persistent hash map package is under GNU v.3, which is not compatible with apache license. We also looked at other existing open source packages before implementing our own HashMapWrapper. One of the problems we found out then is the license compatibility issue. > GroupByOperator sometimes throws OutOfMemory error when there are too many > distinct keys > ---------------------------------------------------------------------------------------- > > Key: HIVE-1139 > URL: https://issues.apache.org/jira/browse/HIVE-1139 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Ning Zhang > Assignee: Arvind Prabhakar > Attachments: PersistentMap.zip > > > When a partial aggregation performed on a mapper, a HashMap is created to > keep all distinct keys in main memory. This could leads to OOM exception when > there are too many distinct keys for a particular mapper. A workaround is to > set the map split size smaller so that each mapper takes less number of rows. > A better solution is to use the persistent HashMapWrapper (currently used in > CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.