[ 
https://issues.apache.org/jira/browse/PIG-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017485#comment-13017485
 ] 

Thejas M Nair commented on PIG-1911:
------------------------------------

bq. Just help me to understand better, I think fix PORelationToExprProject is 
also possible. Since accumulator only need one extra bag to in order for UDF to 
invoke getValue(). So after exhaust all batch, send one extra bag, then send 
EOP, will solve the problem as well. Is that right?

I looked at that option first, but the problem is that POUserFunc is expected 
to be called with isAccumStarted() == false and result.returnStatus == 
STATUS_OK. In case of a relation like -
F = foreach IN { SBCOL = order BCOL by $1; FBCOL = filter SBCOL by 1 == 2; 
generate COUNT(FBCOL.$0);}
 FBCOL will have nothing to return.With the approach you mention here - The 
first call to the plan will be made with isAccumStarted() == true, and 
PORelationToExprProject will return an empty bag. Another call will be made 
with isAccumStarted() == false, and this time it will return STATUS_EOP. THis 
would mean that the udf.cleanup() will not get called. To avoid this, we would 
need to handle STATUS_EOP differently in POUserFunc.processInput() in 
accumulative mode. That seemed a little less clean than the approach I finally 
took.


> Infinite loop with accumulator function in nested foreach
> ---------------------------------------------------------
>
>                 Key: PIG-1911
>                 URL: https://issues.apache.org/jira/browse/PIG-1911
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Olga Natkovich
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1911.08.1.patch, PIG-1911.trunk.1.patch
>
>
> Sample script:
> register v_udf.jar;
> a = load '2records' as (f1:chararray,f2:chararray);
> b = group a by f1;
> d = foreach b { sort = order a by f1; 
>   generate org.udfs.MyCOUNT(sort) as something ; }
> dump d;
> This causes infinite loop if MyCOUNT implements Accumulator interface.
> The workaround is to take the function out of nested foreach into a separate 
> foreach statement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to