[ 
https://issues.apache.org/jira/browse/PIG-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4282:
----------------------------------
    Attachment: PIG-4282.patch

group operator has different results in different engines like "spark" and 
"mapreduce".
for example:
groupdistinct.pig
{code}
A = load 'input1.txt' as (age:int,gpa:int); 
B = group A by age;  
C = foreach B { 
 D = A.gpa; 
 E = distinct D;
 generate group, MIN(E);
};
dump C;
{code}

input1.txt is:
10      89
20      78
10      68
10      89
20      92

the mapreduce output is:
(10,68),(20,78)

the spark output is 
(20,78),(10,68)

all test cases of TestForEachNestedPlan pass except TestInnerDistinct in spark 
mode. The reason why fails I described above.  Original code only judges the 
result is "(10,68),(20,78)".  PIG-4282.patch will judge  both "(10,68),(20,78)" 
and "(20,78),(10,68)" for the result.





> Enable unit test "TestForEachNestedPlan" for spark
> --------------------------------------------------
>
>                 Key: PIG-4282
>                 URL: https://issues.apache.org/jira/browse/PIG-4282
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>         Attachments: PIG-4282.patch, 
> TEST-org.apache.pig.test.TestForEachNestedPlan.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to