[
https://issues.apache.org/jira/browse/SPARK-19560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kay Ousterhout closed SPARK-19560.
----------------------------------
Resolution: Fixed
Target Version/s: 2.2.0
> Improve tests for when DAGScheduler learns of "successful" ShuffleMapTask
> from a failed executor
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-19560
> URL: https://issues.apache.org/jira/browse/SPARK-19560
> Project: Spark
> Issue Type: Test
> Components: Scheduler
> Affects Versions: 2.1.1
> Reporter: Kay Ousterhout
> Assignee: Kay Ousterhout
> Priority: Minor
>
> There's some tricky code around the case when the DAGScheduler learns of a
> ShuffleMapTask that completed successfully, but ran on an executor that
> failed sometime after the task was launched. This case is tricky because the
> TaskSetManager (i.e., the lower level scheduler) thinks the task completed
> successfully, but the DAGScheduler considers the output it generated to be no
> longer valid (because it was probably lost when the executor was lost). As a
> result, the DAGScheduler needs to re-submit the stage, so that the task can
> be re-run. This is tested in some of the tests but not clearly documented,
> so we should improve this to prevent future bugs (this was encountered by
> [~markhamstra] in attempting to find a better fix for SPARK-19263).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]