[
https://issues.apache.org/jira/browse/PIG-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015661#comment-13015661
]
Thejas M Nair commented on PIG-1963:
------------------------------------
MYCONCATBAG udf in the query in description concatenates the entries in the
bag, in the order it is recieved.
When the query run with the property - pig.accumulative.batchsize=2 ,
and input -
{code}
100 apple
200 orange
300 strawberry
300 pear
100 apple
300 pear
400 apple
{code}
gives output -
{code}
(100,(100)(100),(apple)(apple))
(200,(200),(orange))
(300,(300)(300)(300),(pear)(strawberry)(pear)) -- this should be
(300,(300)(300)(300),(pear)(pear)(strawberry))
(400,(400),(apple))
{code}
> in nested foreach, accumutive udf taking input from order-by does not get
> results in order
> ------------------------------------------------------------------------------------------
>
> Key: PIG-1963
> URL: https://issues.apache.org/jira/browse/PIG-1963
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0, 0.9.0
> Reporter: Thejas M Nair
>
> This happens only when secondary sort is not being used for the order-by.
> For example -
> {code}
> a1 = load 'fruits.txt' as (f1:int,f2);
> a2 = load 'fruits.txt' as (f1:int,f2);
> b = cogroup a1 by f1, a2 by f1;
> d = foreach b {
> sort1 = order a1 by f2;
> sort2 = order a2 by f2; -- secondary sort not getting used here,
> MYCONCATBAG gets results in wrong order
> generate group, MYCONCATBAG(sort1.f1), MYCONCATBAG(sort2.f2);
> }
> -- explain d;
> dump d;
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira