ORDER BY is broken when in combination with LIMIT and FLATTEN
-------------------------------------------------------------
Key: PIG-2236
URL: https://issues.apache.org/jira/browse/PIG-2236
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.1, 0.8.0
Reporter: Sungho Ryu
ORDER BY does not correctly sort the result when used in combination with LIMIT
and FOREACH / FLATTEN.
--- Input data
A 1000
A 128
A 127
A 0
A 1
A 2
B 0
B 1
B 128
B 1001
B 2
B 127
C 0
C 1
C 128
C 1000
C 127
C 2
D 0
D 1
D 128
D 1000
D 2
D 127
----- Test script
data = LOAD 'data' AS (k:chararray, v:int);
grouped = GROUP data BY k;
limited = LIMIT grouped BY 2;
output = FOREACH limited {
ordered = ORDER data BY v;
GENERATE FLATTEN(ordered);
};
output = LIMIT output 10000; -- a workaround for PIG-2231
STORE output INTO 'result';
---- Desired output
A 0
A 1
A 2
A 127
A 128
A 1000
B 0
B 1
B 2
B 127
B 128
B 1001
--- Actual output
A 0
A 1
A 128
A 1000
A 2
A 127
B 0
B 1
B 128
B 1001
B 2
B 127
--------------
As the result shows, ORDER BY does not correctly sort numbers in [2,128) when
LIMIT is applied before or after.
If I remove the both of LIMIT statements, I get the correct result. (tested on
0.8.0, 0.8.1)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira