ORDER BY is broken when in combination with LIMIT and FLATTEN
-------------------------------------------------------------

                 Key: PIG-2236
                 URL: https://issues.apache.org/jira/browse/PIG-2236
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.8.1, 0.8.0
            Reporter: Sungho Ryu


ORDER BY does not correctly sort the result when used in combination with LIMIT 
and FOREACH / FLATTEN.

---  Input data

A   1000
A   128
A   127
A   0
A   1
A   2
B   0
B   1
B   128
B   1001
B   2
B   127
C   0
C   1
C   128
C   1000
C   127
C   2
D   0
D   1
D   128
D   1000
D   2
D   127


-----  Test script

data =  LOAD 'data' AS (k:chararray, v:int);

grouped = GROUP data BY k;

limited = LIMIT grouped BY 2;

output = FOREACH limited {
        ordered = ORDER data BY v;
        GENERATE FLATTEN(ordered);
};

output = LIMIT output 10000;  -- a workaround for PIG-2231

STORE output INTO 'result';

---- Desired output 
A       0
A       1
A       2
A       127
A       128
A       1000
B       0
B       1
B       2
B       127
B       128
B       1001


---  Actual output
A       0
A       1
A       128
A       1000
A       2
A       127
B       0
B       1
B       128
B       1001
B       2
B       127

--------------

As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
LIMIT is applied  before or after.

If I remove the both of LIMIT statements, I get the correct result. (tested on 
0.8.0, 0.8.1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to