[ 
https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463168#comment-16463168
 ] 

Rohini Palaniswamy edited comment on PIG-5338 at 5/3/18 10:54 PM:
------------------------------------------------------------------

A really good idea. Patch looks good. Couple of comments

JythonBag.java:
1) Can you rename variables getIndex and getIterator to currIndex and 
currIterator.
2)  In list___contains__(PyObject o) method,  JythonUtils.pythonToPig(o) can be 
done once outside the loop.
3) getIterator.hasNext() is redundant in get() as we already check size() 
before. If the size and index calculation actually does not match and hasNext 
is false, then something is not right and better to get an error in that case.

JythonBagInterceptor:
1)  If JythonBag does not have the method, you are performing a deep copy. But 
shouldn't the invocation be on the PyList after that? Why call methods on the 
PyList only the next time? How does it work.
{code}
if(!jythonBagMethods.contains(methodInvocation.getMethod())){
48                      pyList = 
((JythonBag)methodInvocation.getThis()).toPyList();
49                  }
50                  return methodInvocation.proceed();
{code}

TestScriptUDF.java:
1)  Can you use assertEquals instead of assertTrue and also hardcode value of 
wordCount instead of computing it every time.
{code}
Assert.assertTrue(((DataBag) t.get(0)).size() == wordCount);
{code}


was (Author: rohini):
A really good idea. Patch looks good. Couple of comments

JythonBag.java:
1) Can you rename variables getIndex and getIterator to currIndex and iterator.
2)  In list___contains__(PyObject o) method,  JythonUtils.pythonToPig(o) can be 
done once outside the loop.
3) getIterator.hasNext() is redundant in get() as we already check size() 
before. If the size and index calculation actually does not match and hasNext 
is false, then something is not right and better to get an error in that case.

JythonBagInterceptor:
1)  If JythonBag does not have the method, you are performing a deep copy. But 
shouldn't the invocation be on the PyList after that? Why call methods on the 
PyList only the next time? How does it work.
{code}
if(!jythonBagMethods.contains(methodInvocation.getMethod())){
48                      pyList = 
((JythonBag)methodInvocation.getThis()).toPyList();
49                  }
50                  return methodInvocation.proceed();
{code}

TestScriptUDF.java:
1)  Can you use assertEquals instead of assertTrue and also hardcode value of 
wordCount instead of computing it every time.
{code}
Assert.assertTrue(((DataBag) t.get(0)).size() == wordCount);
{code}

> Prevent deep copy of DataBag into Jython List
> ---------------------------------------------
>
>                 Key: PIG-5338
>                 URL: https://issues.apache.org/jira/browse/PIG-5338
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Greg Phillips
>            Assignee: Greg Phillips
>            Priority: Major
>         Attachments: PIG-5338.001.patch, PIG-5338.patch
>
>
> Pig Python UDFs currently perform deep copies on Bags converting them into 
> Jython PyLists. This can cause Jython UDFs to run out of memory and fail. A 
> Jython DataBag which extends PyList could allow for iterative access to 
> DataBag elements, while only performing a deep copy when necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to