Imran Rashid created SPARK-7829:
-----------------------------------
Summary: SortShuffleWriter writes inconsistent data & index files
on stage retry
Key: SPARK-7829
URL: https://issues.apache.org/jira/browse/SPARK-7829
Project: Spark
Issue Type: Bug
Components: Shuffle, Spark Core
Affects Versions: 1.3.1
Reporter: Imran Rashid
Assignee: Imran Rashid
When a stage is retried, even if a shuffle map task was successful, it may get
retried in any case. If it happens to get scheduled on the same executor, the
old data file is *appended*, while the index file still assumes the data starts
in position 0. This leads to an apparently corrupt shuffle map output, since
when the data file is read, the index file points to the wrong location.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]