[ 
https://issues.apache.org/jira/browse/PIG-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4294:
----------------------------------
    Attachment: PIG-4294.patch

"group operator"  does not gurantee the result sequence. In different engines 
like "spark" and "mapreduce", the results are different due to the sequence. 
for example
group.pig
{code}
A = load 'table_nf_project' as (a,b,c:chararray);
B = GROUP A BY a;
C = foreach B {tmp = A.a;generate A, tmp; };
D = foreach C generate A.(a,b) as v;
dump D;
{code}

the result of spark engine is:
({(2,5)})
({(1,2)})

the result of mapreduce engine is:
({(1,2)})
({(2,5)})

Some unit tests fails because of the expectedResult is different from the 
actualResult due to the sequence. PIG-4294.patch is fixed for problem above.

> Enable unit test "TestNestedForeach" for spark
> ----------------------------------------------
>
>                 Key: PIG-4294
>                 URL: https://issues.apache.org/jira/browse/PIG-4294
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>         Attachments: PIG-4294.patch, 
> TEST-org.apache.pig.test.TestNestedForeach.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to