[ 
https://issues.apache.org/jira/browse/PIG-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5040:
------------------------------------
    Status: Patch Available  (was: Open)

> Order by and CROSS partitioning is not deterministic due to usage of Random
> ---------------------------------------------------------------------------
>
>                 Key: PIG-5040
>                 URL: https://issues.apache.org/jira/browse/PIG-5040
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>            Priority: Critical
>             Fix For: 0.17.0, 0.16.1
>
>         Attachments: PIG-5040-1-nowhitespacechanges.patch, PIG-5040-1.patch
>
>
> Maps can be rerun due to shuffle fetch failures. Half of the reducers can end 
> up successfully pulling partitions from first run of the map while other half 
> could pull from the rerun after shuffle fetch failures. If the data is not 
> partitioned by the Partitioner exactly the same way every time then it could 
> lead to incorrect results (loss of records and duplicated records). Even 
> though issue has existed for 8 years now with order by and affects mapreduce 
> as well found this with Tez where the frequency of rerun due to shuffle fetch 
> failures is high (Order by partitioner gets its data from a 1-1 edge, so 
> there are no retries and shuffle fetch failures trigger a rerun immediately).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to