[
https://issues.apache.org/jira/browse/SPARK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556246#comment-14556246
]
Imran Rashid commented on SPARK-7829:
-------------------------------------
I'll submit a PR shortly
> SortShuffleWriter writes inconsistent data & index files on stage retry
> -----------------------------------------------------------------------
>
> Key: SPARK-7829
> URL: https://issues.apache.org/jira/browse/SPARK-7829
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, Spark Core
> Affects Versions: 1.3.1
> Reporter: Imran Rashid
> Assignee: Imran Rashid
>
> When a stage is retried, even if a shuffle map task was successful, it may
> get retried in any case. If it happens to get scheduled on the same
> executor, the old data file is *appended*, while the index file still assumes
> the data starts in position 0. This leads to an apparently corrupt shuffle
> map output, since when the data file is read, the index file points to the
> wrong location.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]