[
https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sungho Ryu updated PIG-2236:
----------------------------
Description:
ORDER BY does not correctly sort the result when used in combination with LIMIT
and FOREACH / FLATTEN.
--- Input data
A 1000
A 128
A 127
A 0
A 1
A 2
B 0
B 1
B 128
B 1001
B 2
B 127
C 0
C 1
C 128
C 1000
C 127
C 2
D 0
D 1
D 128
D 1000
D 2
D 127
----- Test script
data = LOAD 'data' AS (k:chararray, v:int);
grouped = GROUP data BY k;
limited = LIMIT grouped 2;
output = FOREACH limited {
ordered = ORDER data BY v;
GENERATE FLATTEN(ordered);
};
output = LIMIT output 10000; -- a workaround for PIG-2231
STORE output INTO 'result';
---- Desired output
A 0
A 1
A 2
A 127
A 128
A 1000
B 0
B 1
B 2
B 127
B 128
B 1001
--- Actual output
A 0
A 1
A 128
A 1000
A 2
A 127
B 0
B 1
B 128
B 1001
B 2
B 127
--------------
As the result shows, ORDER BY does not correctly sort numbers in [2,128) when
LIMIT is applied before or after.
If I remove the both of LIMIT statements, I get the correct result. (tested on
0.8.0, 0.8.1)
was:
ORDER BY does not correctly sort the result when used in combination with LIMIT
and FOREACH / FLATTEN.
--- Input data
A 1000
A 128
A 127
A 0
A 1
A 2
B 0
B 1
B 128
B 1001
B 2
B 127
C 0
C 1
C 128
C 1000
C 127
C 2
D 0
D 1
D 128
D 1000
D 2
D 127
----- Test script
data = LOAD 'data' AS (k:chararray, v:int);
grouped = GROUP data BY k;
limited = LIMIT grouped BY 2;
output = FOREACH limited {
ordered = ORDER data BY v;
GENERATE FLATTEN(ordered);
};
output = LIMIT output 10000; -- a workaround for PIG-2231
STORE output INTO 'result';
---- Desired output
A 0
A 1
A 2
A 127
A 128
A 1000
B 0
B 1
B 2
B 127
B 128
B 1001
--- Actual output
A 0
A 1
A 128
A 1000
A 2
A 127
B 0
B 1
B 128
B 1001
B 2
B 127
--------------
As the result shows, ORDER BY does not correctly sort numbers in [2,128) when
LIMIT is applied before or after.
If I remove the both of LIMIT statements, I get the correct result. (tested on
0.8.0, 0.8.1)
> ORDER BY is broken when in combination with LIMIT and FLATTEN
> -------------------------------------------------------------
>
> Key: PIG-2236
> URL: https://issues.apache.org/jira/browse/PIG-2236
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0, 0.8.1
> Reporter: Sungho Ryu
>
> ORDER BY does not correctly sort the result when used in combination with
> LIMIT and FOREACH / FLATTEN.
> --- Input data
> A 1000
> A 128
> A 127
> A 0
> A 1
> A 2
> B 0
> B 1
> B 128
> B 1001
> B 2
> B 127
> C 0
> C 1
> C 128
> C 1000
> C 127
> C 2
> D 0
> D 1
> D 128
> D 1000
> D 2
> D 127
> ----- Test script
> data = LOAD 'data' AS (k:chararray, v:int);
> grouped = GROUP data BY k;
> limited = LIMIT grouped 2;
> output = FOREACH limited {
> ordered = ORDER data BY v;
> GENERATE FLATTEN(ordered);
> };
> output = LIMIT output 10000; -- a workaround for PIG-2231
> STORE output INTO 'result';
> ---- Desired output
> A 0
> A 1
> A 2
> A 127
> A 128
> A 1000
> B 0
> B 1
> B 2
> B 127
> B 128
> B 1001
> --- Actual output
> A 0
> A 1
> A 128
> A 1000
> A 2
> A 127
> B 0
> B 1
> B 128
> B 1001
> B 2
> B 127
> --------------
> As the result shows, ORDER BY does not correctly sort numbers in [2,128) when
> LIMIT is applied before or after.
> If I remove the both of LIMIT statements, I get the correct result. (tested
> on 0.8.0, 0.8.1)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira