[
https://issues.apache.org/jira/browse/PIG-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4294:
----------------------------------
Attachment: PIG-4294.patch
"group operator" does not gurantee the result sequence. In different engines
like "spark" and "mapreduce", the results are different due to the sequence.
for example
group.pig
{code}
A = load 'table_nf_project' as (a,b,c:chararray);
B = GROUP A BY a;
C = foreach B {tmp = A.a;generate A, tmp; };
D = foreach C generate A.(a,b) as v;
dump D;
{code}
the result of spark engine is:
({(2,5)})
({(1,2)})
the result of mapreduce engine is:
({(1,2)})
({(2,5)})
Some unit tests fails because of the expectedResult is different from the
actualResult due to the sequence. PIG-4294.patch is fixed for problem above.
> Enable unit test "TestNestedForeach" for spark
> ----------------------------------------------
>
> Key: PIG-4294
> URL: https://issues.apache.org/jira/browse/PIG-4294
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: liyunzhang_intel
> Attachments: PIG-4294.patch,
> TEST-org.apache.pig.test.TestNestedForeach.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)