[
https://issues.apache.org/jira/browse/SPARK-15652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Subroto Sanyal updated SPARK-15652:
-----------------------------------
Description:
h6. Problem
In case SparkSubmit JVM goes down even before sending the job complete
notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will not
receive any notification which may lead to the client using SparkLauncher hang
indefinitely.
h6. Root Cause
No proper exception handling at
org.apache.spark.launcher.LauncherConnection#run when an EOFException is
encountered while reading over Socket Stream. Mostly EOFException will be
thrown at the suggested
point(_org.apache.spark.launcher.LauncherConnection.run(LauncherConnection.java:58)_)
if the SparkSubmit JVM is shutdown.
Probably, it was assumed that SparkSubmit JVM can shut down only with normal
healthy completion but, there could be scenarios where this is not the case:
# OS kill the SparkSubmit process using OOM Killer.
# Exception while SparkSubmit submits the job, even before it starts monitoring
the application. This can happen if SparkLauncher is not configured properly.
There might be no exception handling in
org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to any
exception/throwable at this point lead to shutting down of JVM without proper
finalisation
h6. Possible Solutions
# In case of EOFException or any other exception notify the listeners that job
has failed
# Better exception handling on the SparkSubmit JVM side (though this may not
resolve the problem completely)
was:
h6. Problem
In case SparkSubmit JVM goes down even before sending the job complete
notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will not
receive any notification which may lead to the client using SparkLauncher hang
indefinitely.
h6. Root Cause
No proper exception handling at
org.apache.spark.launcher.LauncherConnection#run when an EOFException is
encountered while reading over Socket Stream. Mostly EOFException will be
thrown at the suggested point if the SparkSubmit JVM is shutdown.
Probably, it was assumed that SparkSubmit JVM can shut down only with normal
healthy completion but, there could be scenarios where this is not the case:
# OS kill the SparkSubmit process using OOM Killer.
# Exception while SparkSubmit submits the job, even before it starts monitoring
the application. This can happen if SparkLauncher is not configured properly.
There might be no exception handling in
org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to any
exception/throwable at this point lead to shutting down of JVM without proper
finalisation
h6. Possible Solutions
# In case of EOFException or any other exception notify the listeners that job
has failed
# Better exception handling on the SparkSubmit JVM side (though this may not
resolve the problem completely)
> Missing org.apache.spark.launcher.SparkAppHandle.Listener notification if
> SparkSubmit JVM shutsdown
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-15652
> URL: https://issues.apache.org/jira/browse/SPARK-15652
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Subroto Sanyal
> Priority: Critical
> Attachments: spark-launcher-client-hang.jar
>
>
> h6. Problem
> In case SparkSubmit JVM goes down even before sending the job complete
> notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will
> not receive any notification which may lead to the client using SparkLauncher
> hang indefinitely.
> h6. Root Cause
> No proper exception handling at
> org.apache.spark.launcher.LauncherConnection#run when an EOFException is
> encountered while reading over Socket Stream. Mostly EOFException will be
> thrown at the suggested
> point(_org.apache.spark.launcher.LauncherConnection.run(LauncherConnection.java:58)_)
> if the SparkSubmit JVM is shutdown.
> Probably, it was assumed that SparkSubmit JVM can shut down only with normal
> healthy completion but, there could be scenarios where this is not the case:
> # OS kill the SparkSubmit process using OOM Killer.
> # Exception while SparkSubmit submits the job, even before it starts
> monitoring the application. This can happen if SparkLauncher is not
> configured properly. There might be no exception handling in
> org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to
> any exception/throwable at this point lead to shutting down of JVM without
> proper finalisation
> h6. Possible Solutions
> # In case of EOFException or any other exception notify the listeners that
> job has failed
> # Better exception handling on the SparkSubmit JVM side (though this may not
> resolve the problem completely)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]