[ 
https://issues.apache.org/jira/browse/SPARK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004780#comment-15004780
 ] 

Andrew Or commented on SPARK-7829:
----------------------------------

I believe this is now fixed due to https://github.com/apache/spark/pull/9610. 
Let me know if this is not the case.

> SortShuffleWriter writes inconsistent data & index files on stage retry
> -----------------------------------------------------------------------
>
>                 Key: SPARK-7829
>                 URL: https://issues.apache.org/jira/browse/SPARK-7829
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.3.1
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>             Fix For: 1.5.3, 1.6.0
>
>
> When a stage is retried, even if a shuffle map task was successful, it may 
> get retried in any case.  If it happens to get scheduled on the same 
> executor, the old data file is *appended*, while the index file still assumes 
> the data starts in position 0.  This leads to an apparently corrupt shuffle 
> map output, since when the data file is read, the index file points to the 
> wrong location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to