Imran Rashid created SPARK-7829:
-----------------------------------

             Summary: SortShuffleWriter writes inconsistent data & index files 
on stage retry
                 Key: SPARK-7829
                 URL: https://issues.apache.org/jira/browse/SPARK-7829
             Project: Spark
          Issue Type: Bug
          Components: Shuffle, Spark Core
    Affects Versions: 1.3.1
            Reporter: Imran Rashid
            Assignee: Imran Rashid


When a stage is retried, even if a shuffle map task was successful, it may get 
retried in any case.  If it happens to get scheduled on the same executor, the 
old data file is *appended*, while the index file still assumes the data starts 
in position 0.  This leads to an apparently corrupt shuffle map output, since 
when the data file is read, the index file points to the wrong location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to