[
https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090028#comment-13090028
]
Daniel Dai commented on PIG-2236:
---------------------------------
Yes, it is a side effect of PIG-2231. If you apply PIG-2231 patch, the result
is sorted correctly.
> ORDER BY is broken when in combination with LIMIT and FLATTEN
> -------------------------------------------------------------
>
> Key: PIG-2236
> URL: https://issues.apache.org/jira/browse/PIG-2236
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0, 0.8.1
> Reporter: Sungho Ryu
>
> ORDER BY does not correctly sort the result when used in combination with
> LIMIT and FOREACH / FLATTEN.
> --- Input data
> A 1000
> A 128
> A 127
> A 0
> A 1
> A 2
> B 0
> B 1
> B 128
> B 1001
> B 2
> B 127
> C 0
> C 1
> C 128
> C 1000
> C 127
> C 2
> D 0
> D 1
> D 128
> D 1000
> D 2
> D 127
> ----- Test script
> data = LOAD 'data' AS (k:chararray, v:int);
> grouped = GROUP data BY k;
> limited = LIMIT grouped 2;
> output = FOREACH limited {
> ordered = ORDER data BY v;
> GENERATE FLATTEN(ordered);
> };
> output = LIMIT output 10000; -- a workaround for PIG-2231
> STORE output INTO 'result';
> ---- Desired output
> A 0
> A 1
> A 2
> A 127
> A 128
> A 1000
> B 0
> B 1
> B 2
> B 127
> B 128
> B 1001
> --- Actual output
> A 0
> A 1
> A 128
> A 1000
> A 2
> A 127
> B 0
> B 1
> B 128
> B 1001
> B 2
> B 127
> --------------
> As the result shows, ORDER BY does not correctly sort numbers in [2,128) when
> LIMIT is applied before or after.
> If I remove the both of LIMIT statements, I get the correct result. (tested
> on 0.8.0, 0.8.1)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira