[
https://issues.apache.org/jira/browse/PIG-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017485#comment-13017485
]
Thejas M Nair commented on PIG-1911:
------------------------------------
bq. Just help me to understand better, I think fix PORelationToExprProject is
also possible. Since accumulator only need one extra bag to in order for UDF to
invoke getValue(). So after exhaust all batch, send one extra bag, then send
EOP, will solve the problem as well. Is that right?
I looked at that option first, but the problem is that POUserFunc is expected
to be called with isAccumStarted() == false and result.returnStatus ==
STATUS_OK. In case of a relation like -
F = foreach IN { SBCOL = order BCOL by $1; FBCOL = filter SBCOL by 1 == 2;
generate COUNT(FBCOL.$0);}
FBCOL will have nothing to return.With the approach you mention here - The
first call to the plan will be made with isAccumStarted() == true, and
PORelationToExprProject will return an empty bag. Another call will be made
with isAccumStarted() == false, and this time it will return STATUS_EOP. THis
would mean that the udf.cleanup() will not get called. To avoid this, we would
need to handle STATUS_EOP differently in POUserFunc.processInput() in
accumulative mode. That seemed a little less clean than the approach I finally
took.
> Infinite loop with accumulator function in nested foreach
> ---------------------------------------------------------
>
> Key: PIG-1911
> URL: https://issues.apache.org/jira/browse/PIG-1911
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Olga Natkovich
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1911.08.1.patch, PIG-1911.trunk.1.patch
>
>
> Sample script:
> register v_udf.jar;
> a = load '2records' as (f1:chararray,f2:chararray);
> b = group a by f1;
> d = foreach b { sort = order a by f1;
> generate org.udfs.MyCOUNT(sort) as something ; }
> dump d;
> This causes infinite loop if MyCOUNT implements Accumulator interface.
> The workaround is to take the function out of nested foreach into a separate
> foreach statement.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira