[
https://issues.apache.org/jira/browse/SPARK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557624#comment-14557624
]
Josh Rosen commented on SPARK-7829:
-----------------------------------
Thanks for splitting this off as a sub-issue from SPARK-7308. This issue might
be one of the last remaining pieces for explaining some of the shuffle
corruption issues that we've seen in sort-based shuffle. A bug here would
actually be consistent with some of the non-determinism of that issue, since it
sounds like this issue is only triggered in certain stage retry cases when
using certain shuffle paths.
As I commented over at SPARK-7308, the best way to address this might be with a
sort of commit protocol in the ShuffleMapTask code. Some of the fixes that
you've included for this as part of your other patch seem okay, but I think
that they're a little messy compared to avoiding the appends in the first
place. I'm was wondering whether we could just delete the old file rather than
appending to it, but that might mess things up if another concurrent downstream
stage is attempting to fetch from those map output partitions while we're
recomputing them.
> SortShuffleWriter writes inconsistent data & index files on stage retry
> -----------------------------------------------------------------------
>
> Key: SPARK-7829
> URL: https://issues.apache.org/jira/browse/SPARK-7829
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, Spark Core
> Affects Versions: 1.3.1
> Reporter: Imran Rashid
> Assignee: Imran Rashid
>
> When a stage is retried, even if a shuffle map task was successful, it may
> get retried in any case. If it happens to get scheduled on the same
> executor, the old data file is *appended*, while the index file still assumes
> the data starts in position 0. This leads to an apparently corrupt shuffle
> map output, since when the data file is read, the index file points to the
> wrong location.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]