Greg Phillips updated PIG-5338:
    Status: Patch Available  (was: Open)

First patch includes basic Jython DataBag implementation. The JythonBag extends 
PyList while only overriding the subset of methods which are needed for basic 
DataBag access (i.e. Sequential access/Iteration, accessing indexed elements). 
Once a method of the superclass is invoked a deep copy is performed, and future 
calls to the object will access PyList directly. This implementation allows for 
Jython UDF's to access DataBags larger than could fit it memory, while still 
providing backwards compatibility with the full range of methods PyList 

> Prevent deep copy of DataBag into Jython List
> ---------------------------------------------
>                 Key: PIG-5338
>                 URL: https://issues.apache.org/jira/browse/PIG-5338
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Greg Phillips
>            Assignee: Greg Phillips
>            Priority: Major
>         Attachments: PIG-5338.patch
> Pig Python UDFs currently perform deep copies on Bags converting them into 
> Jython PyLists. This can cause Jython UDFs to run out of memory and fail. A 
> Jython DataBag which extends PyList could allow for iterative access to 
> DataBag elements, while only performing a deep copy when necessary.

This message was sent by Atlassian JIRA

Reply via email to