Subroto Sanyal created SPARK-15652:
--------------------------------------
Summary: Missing org.apache.spark.launcher.SparkAppHandle.Listener
notification if SparkSubmit JVM shutsdown
Key: SPARK-15652
URL: https://issues.apache.org/jira/browse/SPARK-15652
Project: Spark
Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Subroto Sanyal
Priority: Critical
h6. Problem
In case SparkSubmit JVM goes down even before sending the job complete
notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will not
receive any notification which may lead to the client using SparkLauncher hang
indefinitely.
h6. Root Cause
No proper exception handling at
org.apache.spark.launcher.LauncherConnection#run when an EOFException is
encountered while reading over Socket Stream. Mostly EOFException will be
thrown at the suggested point if the SparkSubmit JVM is shutdown.
Probably, it was assumed that SparkSubmit JVM can shut down only with normal
healthy completion but, there could be scenarios where this is not the case:
# OS kill the SparkSubmit process using OOM Killer.
# Exception while SparkSubmit submits the job, even before it starts monitoring
the application. This can happen if SparkLauncher is not configured properly.
There might be no exception handling in
org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to any
exception/throwable at this point lead to shutting down of JVM without proper
finalisation
h6. Possible Solutions
# In case of EOFException or any other exception notify the listeners that job
has failed
# Better exception handling on the SparkSubmit JVM side (though this may not
resolve the problem completely)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]