[
https://issues.apache.org/jira/browse/SPARK-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925125#comment-15925125
]
Apache Spark commented on SPARK-14649:
--------------------------------------
User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/17297
> DagScheduler re-starts all running tasks on fetch failure
> ---------------------------------------------------------
>
> Key: SPARK-14649
> URL: https://issues.apache.org/jira/browse/SPARK-14649
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Reporter: Sital Kedia
>
> When a fetch failure occurs, the DAGScheduler re-launches the previous stage
> (to re-generate output that was missing), and then re-launches all tasks in
> the stage with the fetch failure that hadn't *completed* when the fetch
> failure occurred (the DAGScheduler re-lanches all of the tasks whose output
> data is not available -- which is equivalent to the set of tasks that hadn't
> yet completed).
> The assumption when this code was originally written was that when a fetch
> failure occurred, the output from at least one of the tasks in the previous
> stage was no longer available, so all of the tasks in the current stage would
> eventually fail due to not being able to access that output. This assumption
> does not hold for some large-scale, long-running workloads. E.g., there's
> one use case where a job has ~100k tasks that each run for about 1 hour, and
> only the first 5-10 minutes are spent fetching data. Because of the large
> number of tasks, it's very common to see a few tasks fail in the fetch phase,
> and it's wasteful to re-run other tasks that had finished fetching data so
> aren't affected by the fetch failure (and may be most of the way through
> their hour-long execution). The DAGScheduler should not re-start these tasks.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]