[
https://issues.apache.org/jira/browse/SPARK-20996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038404#comment-16038404
]
Apache Spark commented on SPARK-20996:
--------------------------------------
User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/18213
> Better handling AM reattempt based on exit code in yarn mode
> ------------------------------------------------------------
>
> Key: SPARK-20996
> URL: https://issues.apache.org/jira/browse/SPARK-20996
> Project: Spark
> Issue Type: Improvement
> Components: YARN
> Affects Versions: 2.2.0
> Reporter: Saisai Shao
> Priority: Minor
>
> Yarn provides max attempt configuration for applications running on it,
> application has the chance to retry itself when failed. In the current Spark
> code, no matter which failure AM occurred and if the failure doesn't reach to
> the max attempt, RM will restart AM, this is not reasonable for some cases if
> this issue is coming from AM itself, like user code failure, OOM, Spark
> issue, executor failures, in large chance the reattempt of AM will meet this
> issue again. Only when AM is failed due to external issue like crash, process
> kill, NM failure, then AM should retry again.
> So here propose to improve this reattempt mechanism to only retry when it
> meets external issues.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]