[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875650#action_12875650 ]
Arvind Prabhakar commented on HIVE-1139: ---------------------------------------- Soundararajan, Ning - Yes I am planning on working on it starting next week. I expect this to take at least upto mid to late in the week in order to get a patch available for this. However, if that schedule does not work for you, please feel free to take this issue into your queue and go ahead. It will be great if you could confirm it either way first. Arvind > GroupByOperator sometimes throws OutOfMemory error when there are too many > distinct keys > ---------------------------------------------------------------------------------------- > > Key: HIVE-1139 > URL: https://issues.apache.org/jira/browse/HIVE-1139 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Ning Zhang > Assignee: Arvind Prabhakar > > When a partial aggregation performed on a mapper, a HashMap is created to > keep all distinct keys in main memory. This could leads to OOM exception when > there are too many distinct keys for a particular mapper. A workaround is to > set the map split size smaller so that each mapper takes less number of rows. > A better solution is to use the persistent HashMapWrapper (currently used in > CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.