Terry Kim created SPARK-31625:
---------------------------------

             Summary: Unregister application from YARN resource manager outside 
the shutdown hook
                 Key: SPARK-31625
                 URL: https://issues.apache.org/jira/browse/SPARK-31625
             Project: Spark
          Issue Type: Improvement
          Components: YARN
    Affects Versions: 3.1.0
            Reporter: Terry Kim


Currently, an application is unregistered from YARN resource manager as a 
shutdown hook. In the scenario where the shutdown hook does not run (e.g., 
timeouts, etc.), the application is not unregistered, resulting in YARN 
resubmitting the application even if it succeeded.

For example, you could see the following on the driver log:
{code:java}
20/04/30 06:20:29 INFO SparkContext: Successfully stopped SparkContext
20/04/30 06:20:29 INFO ApplicationMaster: Final app status: SUCCEEDED, 
exitCode: 0
20/04/30 06:20:59 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, 
java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at 
org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
        at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
{code}
On the YARN RM side:
{code:java}
2020-04-30 06:21:25,083 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1588227360159_0001_01_000001 Container Transitioned from RUNNING to 
COMPLETED
2020-04-30 06:21:25,085 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Updating application attempt appattempt_1588227360159_0001_000001 with final 
state: FAILED, and exit status: 0
2020-04-30 06:21:25,085 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1588227360159_0001_000001 State change from RUNNING to FINAL_SAVING 
on event = CONTAINER_FINISHED
{code}
You see the final state of the application becomes FAILED since container is 
finished before the application is unregistered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to