Usually I have been fixing these kinds of tests by adding an order by when I added new tests for Union for Tez. In this case you can add order by after the distinct in the nested foreach.
Daniel, Any better suggestions? Regards, Rohini On Wed, Dec 17, 2014 at 10:38 PM, Zhang, Liyun <liyun.zh...@intel.com> wrote: > > Hi all, > I met a problem that “group operator has different results in different > engines like "spark" and "mapreduce"(PIG-4282< > https://issues.apache.org/jira/browse/PIG-4282>). > > groupdistinct.pig > A = load 'input1.txt' as (age:int,gpa:int); > B = group A by age; > C = foreach B { > D = A.gpa; > E = distinct D; > generate group, MIN(E); > }; > dump C; > input1.txt is: > 10 89 > 20 78 > 10 68 > 10 89 > 20 92 > the mapreduce output is: > (10,68),(20,78) > the spark output is > (20,78),(10,68) > These two results are different, because the sequence of field ‘group’ is > not same. > > Is there any way to guarantee the sequence of “group” field as the input > when using “group” operator in pig? > > > Best regards > Zhang,Liyun > >