[ https://issues.apache.org/jira/browse/PIG-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576875#comment-15576875 ]
Daniel Dai commented on PIG-5040: --------------------------------- You are right, single reducer should be fine. > Order by and CROSS partitioning is not deterministic due to usage of Random > --------------------------------------------------------------------------- > > Key: PIG-5040 > URL: https://issues.apache.org/jira/browse/PIG-5040 > Project: Pig > Issue Type: Bug > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Priority: Critical > Fix For: 0.17.0, 0.16.1 > > Attachments: PIG-5040-1-nowhitespacechanges.patch, PIG-5040-1.patch > > > Maps can be rerun due to shuffle fetch failures. Half of the reducers can end > up successfully pulling partitions from first run of the map while other half > could pull from the rerun after shuffle fetch failures. If the data is not > partitioned by the Partitioner exactly the same way every time then it could > lead to incorrect results (loss of records and duplicated records). Even > though issue has existed for 8 years now with order by and affects mapreduce > as well found this with Tez where the frequency of rerun due to shuffle fetch > failures is high (Order by partitioner gets its data from a 1-1 edge, so > there are no retries and shuffle fetch failures trigger a rerun immediately). -- This message was sent by Atlassian JIRA (v6.3.4#6332)