[
https://issues.apache.org/jira/browse/PIG-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174103#comment-14174103
]
Praveen Rachabattuni commented on PIG-4237:
-------------------------------------------
Could give this patch(PIG-4237-1.patch) a try. I have avoided pig to save
intermediate data as InternalCachedBag which is extending SelfSpillBag,
probably have to re-think about the purpose of pig in doing so.
However, I plan to submit the patch into seperate jiras to have a better
context on the changes.
Thanks for reporting [~Carlos Balduz]
> Error when there is a bag inside an RDD
> ---------------------------------------
>
> Key: PIG-4237
> URL: https://issues.apache.org/jira/browse/PIG-4237
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Carlos Balduz
> Assignee: Carlos Balduz
> Priority: Critical
> Labels: spork
> Attachments: PIG-4237-1.diff
>
>
> Bags cannot be sent to an RDD, as it produces a SelfSpillBag$MemoryLimits not
> Serializable exception. This results in an error for almost every operation
> performed after grouping tuples.
> This error is fixed after making transient the protected MemoryLimit memLimit
> attribute inside org.apache.pig.data.SelfSpillBag, but I do not know the
> impact of this change.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)