[
https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454432#comment-16454432
]
Greg Phillips commented on PIG-5338:
------------------------------------
Thanks [~knoguchi]! I was able to run e2e successfully on a small cluster in a
reasonable amount of time (220 minutes). In addition to resolving the in the
e2e error noted before I've added testing, documentation, and the ability to
return a native java DataBag from the Jython UDF. I'm not certain returning a
DataBag is the correct way to go, I may add more functionality to the JythonBag
to make it writable if that seems like a better way to proceed.
> Prevent deep copy of DataBag into Jython List
> ---------------------------------------------
>
> Key: PIG-5338
> URL: https://issues.apache.org/jira/browse/PIG-5338
> Project: Pig
> Issue Type: Improvement
> Reporter: Greg Phillips
> Assignee: Greg Phillips
> Priority: Major
> Attachments: PIG-5338.001.patch, PIG-5338.patch
>
>
> Pig Python UDFs currently perform deep copies on Bags converting them into
> Jython PyLists. This can cause Jython UDFs to run out of memory and fail. A
> Jython DataBag which extends PyList could allow for iterative access to
> DataBag elements, while only performing a deep copy when necessary.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)