Introducing a new parameter for Map-side join bucket size
---------------------------------------------------------
Key: HIVE-1158
URL: https://issues.apache.org/jira/browse/HIVE-1158
Project: Hadoop Hive
Issue Type: Improvement
Affects Versions: 0.5.0, 0.6.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Map-side join cache the small table in memory and join with the split of the
large table at the mapper side. If the small table is too large, it uses
RowContainer to cache a number of rows indicated by parameter
hive.join.cache.size, whose default value is 25000. This parameter is also used
for regular reducer-side joins to cache all input tables except the streaming
table. This default value is too large for map-side join bucket size, resulting
in OOM exceptions sometimes. We should define a different parameter to separate
these two cache sizes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.