[
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403468#comment-13403468
]
Dmitriy V. Ryaboy commented on PIG-2774:
----------------------------------------
I like the first paragraph of what you said; the second is more applicable to
skew join (reduce side) than map join (map side), I think. With a mapside join,
we might have other operations queued up after the join happening on the same
mapper, and tracing through separate split files will get unnecessarily
complicated.
> Fix merge join to work with many duplicate left keys
> ----------------------------------------------------
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
> Issue Type: Bug
> Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is
> large as it accumulates all of them in memory. There are two solutions around
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side
> index.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira