[ 
https://issues.apache.org/jira/browse/SPARK-37097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37097:
------------------------------
    Description: 

1. Cluster mode AM shutdown hook triggered
2. am unregister from RM timeout, but AM shutdown hook have try catch, so AM 
container exit with code 0.
3. Since RM lose connection with AM, then treat this container as failed.
4. Then client side got application report as final status failed but am 
container exit code 0. Then retry.

  was:
Cluster mode AM shutdown hook triggered, am unregister from RM timeout, but AM 
shutdown hook have try catch, so AM container exit with code 0. But since RM 
lose connection with AM, then treat this container as failed.

Then client side got application report as final status failed but am container 
exit code 0. Then retry.


> yarn-cluster mode, unregister timeout cause spark retry but AM container exit 
> with code 0
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-37097
>                 URL: https://issues.apache.org/jira/browse/SPARK-37097
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: angerszhu
>            Priority: Major
>
> 1. Cluster mode AM shutdown hook triggered
> 2. am unregister from RM timeout, but AM shutdown hook have try catch, so AM 
> container exit with code 0.
> 3. Since RM lose connection with AM, then treat this container as failed.
> 4. Then client side got application report as final status failed but am 
> container exit code 0. Then retry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to