[ 
https://issues.apache.org/jira/browse/SPARK-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557620#comment-14557620
 ] 

Josh Rosen commented on SPARK-7308:
-----------------------------------

I think that properly fixing this set of issues will involve both scheduler and 
shuffle write path changes.

Spark's task cancellation is best-effort, so even if we fix the scheduler 
issues it still is possible that a delayed task from an earlier stage might 
conflict with a task from a subsequent attempt.  I think that we should focus 
first on making it safe for multiple attempts of the same task to be running 
concurrently on the same executor, then focus on making the scheduler changes 
to prevent this scenario from happening. I like Marcelo's suggestion that 
different task attempts write their output to different files. Note, however, 
that the name of the shuffle output file is an implicit interface that's used 
by our external shuffle service. As a result, I think that we need to ensure 
that the final "winning" task attempt renames its temporary / staging files to 
the filenames that we're using now.  To do this, I think that we can implement 
some simple synchronization within an Executor JVM to implement 
last-writer-wins atomic renaming / commit of output files (this is similar in 
spirit to OutputCommitCoordinator, but _much_ simpler since it's local 
coordination).

Once we fix the safety issue, we can then address the scheduler logic changes. 
I think that addressing these two pieces in this order makes the most sense, 
since scheduler changes have historically been very hard to perform correctly.  
Since these sets of changes are largely orthogonal, splitting them into 
separate patches will significantly lower our review burden and make things 
easier for component-specific maintainers (e.g. it'll be easier for the 
scheduler maintainers to review a smaller patch without a bunch of unrelated 
changes to executor commit coordination).

Since it sounds like you already have a good test that reproduces all of the 
bugs, I would welcome a patch which commits a failing test (we could just wrap 
it in a try-catch block or add an expected exception to the test declaration). 
This will help to keep the testing work that you've done so far from bitrotting 
while we work on the fix.

> Should there be multiple concurrent attempts for one stage?
> -----------------------------------------------------------
>
>                 Key: SPARK-7308
>                 URL: https://issues.apache.org/jira/browse/SPARK-7308
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>         Attachments: SPARK-7308_discussion.pdf
>
>
> Currently, when there is a fetch failure, you can end up with multiple 
> concurrent attempts for the same stage.  Is this intended?  At best, it leads 
> to some very confusing behavior, and it makes it hard for the user to make 
> sense of what is going on.  At worst, I think this is cause of some very 
> strange errors we've seen errors we've seen from users, where stages start 
> executing before all the dependent stages have completed.
> This can happen in the following scenario:  there is a fetch failure in 
> attempt 0, so the stage is retried.  attempt 1 starts.  But, tasks from 
> attempt 0 are still running -- some of them can also hit fetch failures after 
> attempt 1 starts.  That will cause additional stage attempts to get fired up.
> There is an attempt to handle this already 
> https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105
> but that only checks whether the **stage** is running.  It really should 
> check whether that **attempt** is still running, but there isn't enough info 
> to do that.  
> I'll also post some info on how to reproduce this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to