[
https://issues.apache.org/jira/browse/PIG-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504138#comment-14504138
]
Mohit Sabharwal commented on PIG-4276:
--------------------------------------
Thanks, [~kellyzly].
I avoided using Util.checkQueryOutputsAfterSort in both Spark and non-Spark
mode because in our email thread with [~rohini], she preferred not changing
MR/Tez behavior (so we know when these engines change current behavior).
I attempted to avoid introducing an explicit "ORDER BY" to order the actual
results. But in a few cases,
these seem unavoidable:
- TestEvalPipeline.testExpressionReUse: Here we do some math after calling
DISTINCT command
- TestEvalPipleline.testArithmeticCloning: Again, we do some math after
calling DISTINCT command
- TestEvalPipeline2.testLimitFlatten: Here GROUP BY is followed by a LIMIT 2
Let me know if you have suggestions to do this any cleaner.
BTW: There is some ongoing discussion in SPARK-2926 for reducers to receive map
outputs sorted by key,
so we may have same behavior in MR and Spark in a future Spark release.
> Fix ordering related failures in TestEvalPipeline for Spark
> -----------------------------------------------------------
>
> Key: PIG-4276
> URL: https://issues.apache.org/jira/browse/PIG-4276
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4276.patch,
> TEST-org.apache.pig.test.TestEvalPipeline.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)