[
https://issues.apache.org/jira/browse/SPARK-37097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
angerszhu updated SPARK-37097:
------------------------------
Description:
1. Cluster mode AM shutdown hook triggered
2. am unregister from RM timeout, but AM shutdown hook have try catch, so AM
container exit with code 0.
3. Since RM lose connection with AM, then treat this container as failed.
4. Then client side got application report as final status failed but am
container exit code 0. Then retry.
was:
Cluster mode AM shutdown hook triggered, am unregister from RM timeout, but AM
shutdown hook have try catch, so AM container exit with code 0. But since RM
lose connection with AM, then treat this container as failed.
Then client side got application report as final status failed but am container
exit code 0. Then retry.
> yarn-cluster mode, unregister timeout cause spark retry but AM container exit
> with code 0
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-37097
> URL: https://issues.apache.org/jira/browse/SPARK-37097
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: angerszhu
> Priority: Major
>
> 1. Cluster mode AM shutdown hook triggered
> 2. am unregister from RM timeout, but AM shutdown hook have try catch, so AM
> container exit with code 0.
> 3. Since RM lose connection with AM, then treat this container as failed.
> 4. Then client side got application report as final status failed but am
> container exit code 0. Then retry.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]