Re: Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig

Rohini Palaniswamy Mon, 22 Dec 2014 18:42:06 -0800

Usually I have been fixing these kinds of tests by adding an order by when
I added new tests for Union for Tez. In this case you can add order by
after the distinct in the nested foreach.


Daniel,
    Any better suggestions?

Regards,
Rohini

On Wed, Dec 17, 2014 at 10:38 PM, Zhang, Liyun <liyun.zh...@intel.com>
wrote:
>
> Hi all,
>    I met a problem that “group operator has different results in different
> engines like "spark" and "mapreduce"(PIG-4282<
> https://issues.apache.org/jira/browse/PIG-4282>).
>
> groupdistinct.pig
> A = load 'input1.txt' as (age:int,gpa:int);
> B = group A by age;
> C = foreach B {
>  D = A.gpa;
>  E = distinct D;
> generate group, MIN(E);
> };
> dump C;
> input1.txt is:
> 10 89
> 20 78
> 10 68
> 10 89
> 20 92
> the mapreduce output is:
> (10,68),(20,78)
> the spark output is
> (20,78),(10,68)
> These two results are different, because the sequence of field ‘group’ is
> not same.
>
> Is there any way to guarantee the sequence of “group” field as the input
> when using “group” operator in pig?
>
>
> Best regards
> Zhang,Liyun
>
>

Re: Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig

Reply via email to