[
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402923#comment-13402923
]
Dmitriy V. Ryaboy commented on PIG-2774:
----------------------------------------
Generating non-standard splits can get tricky in the solution Thejas proposed..
Also I'd like to avoid having the user encode these details in the pig script.
> Fix merge join to work with many duplicate left keys
> ----------------------------------------------------
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
> Issue Type: Bug
> Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is
> large as it accumulates all of them in memory. There are two solutions around
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side
> index.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira