Hi all, I met a problem that “group operator has different results in different engines like "spark" and "mapreduce"(PIG-4282<https://issues.apache.org/jira/browse/PIG-4282>).
groupdistinct.pig A = load 'input1.txt' as (age:int,gpa:int); B = group A by age; C = foreach B { D = A.gpa; E = distinct D; generate group, MIN(E); }; dump C; input1.txt is: 10 89 20 78 10 68 10 89 20 92 the mapreduce output is: (10,68),(20,78) the spark output is (20,78),(10,68) These two results are different, because the sequence of field ‘group’ is not same. Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig? Best regards Zhang,Liyun