[
https://issues.apache.org/jira/browse/SPARK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615522#comment-14615522
]
Apache Spark commented on SPARK-8297:
-------------------------------------
User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/7243
> Scheduler backend is not notified in case node fails in YARN
> ------------------------------------------------------------
>
> Key: SPARK-8297
> URL: https://issues.apache.org/jira/browse/SPARK-8297
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0
> Environment: Spark on yarn - both client and cluster mode.
> Reporter: Mridul Muralidharan
> Priority: Critical
>
> When a node crashes, yarn detects the failure and notifies spark - but this
> information is not propagated to scheduler backend (unlike in mesos mode, for
> example).
> It results in repeated re-execution of stages (due to FetchFailedException on
> shuffle side), resulting finally in application failure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]