[
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Zhang updated HIVE-1158:
-----------------------------
Attachment: HIVE-1158.patch
> Introducing a new parameter for Map-side join bucket size
> ---------------------------------------------------------
>
> Key: HIVE-1158
> URL: https://issues.apache.org/jira/browse/HIVE-1158
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.5.0, 0.6.0
> Reporter: Ning Zhang
> Assignee: Ning Zhang
> Attachments: HIVE-1158.patch
>
>
> Map-side join cache the small table in memory and join with the split of the
> large table at the mapper side. If the small table is too large, it uses
> RowContainer to cache a number of rows indicated by parameter
> hive.join.cache.size, whose default value is 25000. This parameter is also
> used for regular reducer-side joins to cache all input tables except the
> streaming table. This default value is too large for map-side join bucket
> size, resulting in OOM exceptions sometimes. We should define a different
> parameter to separate these two cache sizes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.