[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879061#action_12879061 ]
Ning Zhang commented on HIVE-1139: ---------------------------------- I'm not aware of an efficient serde that are reflection based. The XMLEncoder/Decoder is JAXB-based and are very inefficient. That's another reason we don't want to use it in the execution code path. A better way is to use the Hive SerDe (e.g., LazyBinarySerde) just as it is done in RowContainer. Another way to tackle this problem is have more accurate estimate of how many rows can be fit into the main memory. The current code checks the amount of available memory and use 0.25 (by default) of them to hold the hashmap. We set it to 0.15 in our environment and it works for most cases. 0.15 is probably a little bit conservative. Some experiments need to be done to tune this parameter so that most cases will be fit into main memory and only for the exceptional cases the secondary storage will be used. > GroupByOperator sometimes throws OutOfMemory error when there are too many > distinct keys > ---------------------------------------------------------------------------------------- > > Key: HIVE-1139 > URL: https://issues.apache.org/jira/browse/HIVE-1139 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.5.0 > Reporter: Ning Zhang > Assignee: Arvind Prabhakar > Attachments: PersistentMap.zip > > > When a partial aggregation performed on a mapper, a HashMap is created to > keep all distinct keys in main memory. This could leads to OOM exception when > there are too many distinct keys for a particular mapper. A workaround is to > set the map split size smaller so that each mapper takes less number of rows. > A better solution is to use the persistent HashMapWrapper (currently used in > CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.