[ 
https://issues.apache.org/jira/browse/PIG-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504138#comment-14504138
 ] 

Mohit Sabharwal commented on PIG-4276:
--------------------------------------

Thanks, [~kellyzly]. 

I avoided using Util.checkQueryOutputsAfterSort in both Spark and non-Spark 
mode because in our email thread with [~rohini], she preferred not changing 
MR/Tez behavior (so we know when these engines change current behavior).

I attempted to avoid introducing an explicit "ORDER BY" to order the actual 
results. But in a few cases,
these seem unavoidable:
 - TestEvalPipeline.testExpressionReUse: Here we do some math after calling 
DISTINCT command
 - TestEvalPipleline.testArithmeticCloning: Again, we do some math after 
calling DISTINCT command
 - TestEvalPipeline2.testLimitFlatten: Here GROUP BY is followed by a LIMIT 2

Let me know if you have suggestions to do this any cleaner.

BTW: There is some ongoing discussion in SPARK-2926 for reducers to receive map 
outputs sorted by key,
so we may have same behavior in MR and Spark in a future Spark release.  

> Fix ordering related failures in TestEvalPipeline for Spark
> -----------------------------------------------------------
>
>                 Key: PIG-4276
>                 URL: https://issues.apache.org/jira/browse/PIG-4276
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4276.patch, 
> TEST-org.apache.pig.test.TestEvalPipeline.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to