[ 
https://issues.apache.org/jira/browse/SPARK-15652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subroto Sanyal updated SPARK-15652:
-----------------------------------
    Description: 
h6. Problem
In case SparkSubmit JVM goes down even before sending the job complete 
notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will not 
receive any notification which may lead to the client using SparkLauncher hang 
indefinitely.
h6. Root Cause
No proper exception handling at 
org.apache.spark.launcher.LauncherConnection#run when an EOFException is 
encountered while reading over Socket Stream. Mostly EOFException will be 
thrown at the suggested point if the SparkSubmit JVM is shutdown. 
Probably, it was assumed that SparkSubmit JVM can shut down only with normal 
healthy completion but, there could be scenarios where this is not the case:
# OS kill the SparkSubmit process using OOM Killer.
# Exception while SparkSubmit submits the job, even before it starts monitoring 
the application. This can happen if SparkLauncher is not configured properly. 
There might be no exception handling in 
org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to any 
exception/throwable at this point lead to shutting down of JVM without proper 
finalisation

h6. Possible Solutions
# In case of EOFException or any other exception notify the listeners that job 
has failed
# Better exception handling on the SparkSubmit JVM side (though this may not 
resolve the problem completely)



  was:
h6. Problem
In case SparkSubmit JVM goes down even before sending the job complete 
notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will not 
receive any notification which may lead to the client using SparkLauncher hang 
indefinitely.
h6. Root Cause
No proper exception handling at 
org.apache.spark.launcher.LauncherConnection#run when an EOFException is 
encountered while reading over Socket Stream. Mostly EOFException will be 
thrown at the suggested point if the SparkSubmit JVM is shutdown. 
Probably, it was assumed that SparkSubmit JVM can shut down only with normal 
healthy completion but, there could be scenarios where this is not the case:
# OS kill the SparkSubmit process using OOM Killer.
# Exception while SparkSubmit submits the job, even before it starts monitoring 
the application. This can happen if SparkLauncher is not configured properly. 
There might be no exception handling in 
org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to any 
exception/throwable at this point lead to shutting down of JVM without proper 
finalisation
h6. Possible Solutions
# In case of EOFException or any other exception notify the listeners that job 
has failed
# Better exception handling on the SparkSubmit JVM side (though this may not 
resolve the problem completely)




> Missing org.apache.spark.launcher.SparkAppHandle.Listener notification if 
> SparkSubmit JVM shutsdown
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15652
>                 URL: https://issues.apache.org/jira/browse/SPARK-15652
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Subroto Sanyal
>            Priority: Critical
>
> h6. Problem
> In case SparkSubmit JVM goes down even before sending the job complete 
> notification; the _org.apache.spark.launcher.SparkAppHandle.Listener_ will 
> not receive any notification which may lead to the client using SparkLauncher 
> hang indefinitely.
> h6. Root Cause
> No proper exception handling at 
> org.apache.spark.launcher.LauncherConnection#run when an EOFException is 
> encountered while reading over Socket Stream. Mostly EOFException will be 
> thrown at the suggested point if the SparkSubmit JVM is shutdown. 
> Probably, it was assumed that SparkSubmit JVM can shut down only with normal 
> healthy completion but, there could be scenarios where this is not the case:
> # OS kill the SparkSubmit process using OOM Killer.
> # Exception while SparkSubmit submits the job, even before it starts 
> monitoring the application. This can happen if SparkLauncher is not 
> configured properly. There might be no exception handling in 
> org.apache.spark.deploy.yarn.Client#submitApplication(), which may lead to 
> any exception/throwable at this point lead to shutting down of JVM without 
> proper finalisation
> h6. Possible Solutions
> # In case of EOFException or any other exception notify the listeners that 
> job has failed
> # Better exception handling on the SparkSubmit JVM side (though this may not 
> resolve the problem completely)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to