[jira] [Resolved] (SPARK-16752) Spark Job Server not releasing jobs from "running list" even after yarn completes the job

Marcelo Vanzin (JIRA) Wed, 27 Jul 2016 09:24:17 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marcelo Vanzin resolved SPARK-16752.
------------------------------------
    Resolution: Invalid

This is not the place to report bugs about the SJS, which is unrelated to the 
Spark project.

> Spark Job Server not releasing jobs from "running list" even after yarn 
> completes the job
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-16752
>                 URL: https://issues.apache.org/jira/browse/SPARK-16752
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.6.0, 1.5.0
>         Environment: SJS version 0.6.1 and Spark 1.5.0 running on Yarn-client 
> mode
>            Reporter: Ash Pran
>              Labels: patch
>         Attachments: SJS_JOBS_RUNNING, SJS_JOB_COMP_YARN, 
> SJS_JOB_LOG_CONSOLE, SJS_Limited_Log.txt
>
>
> We are having a strange issue with Spark Job Server (SJS)
> We are using SJS 0.6.1 and Spark 1.5.0 with "yarn-client" mode. The details 
> of settings.sh for SJS is as below
> ********************************************************************
> INSTALL_DIR=$(cd `dirname $0`; pwd -P)
> LOG_DIR=$INSTALL_DIR/logs
> PIDFILE=spark-jobserver.pid
> JOBSERVER_MEMORY=16G
> SPARK_VERSION=1.5.0
> SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/spark
> SPARK_CONF_DIR=$SPARK_HOME/conf
> SCALA_VERSION=2.10.4
> ********************************************************************
> We are using fair scheduling with 2 pools with 50 executors of 1 GB each.
> We are also having max-jobs-per-context set to # of cores, which is 48.
> What we are seeing is for the first 5 minutes or so, it is all good ...the 
> jobs get processed fine.
> After 5 minutes or so, we see these 2 issues happening randomly.
> 1) There are no jobs running in the cluster, completely available, but SJS 
> takes request, but does not submit it to the cluster for almost 3 - 4 minutes 
> and the job will be in "running job" list for that long.
> 2) SJS takes request, submits it to cluster, job gets processed from cluster, 
> but even then, SJS does not move the job to completed list, it keeps it in 
> "running job" list for 3 - 4 minutes before moving it to completed job list 
> and during this time, our application keeps waiting for the response.
> More issue details are documented in the external issue URL given below
> Detailed steps outlined below
> #1 The screenshot (SJS_JOBS_RUNNING) is of running job list.
>     Please look at the 1st row and of the last row, the time submitted for 
> the last job Id in the screenshot (4747ae86-7de3-4819-a29c-2b2c80c568a2) is 
> "16:49:00"
> #2  If you look at 2nd screenshot (SJS_JOB_COMP_YARN) from Spark Yarn 
> cluster, the job was completed at "16:49:25" itself  
> #3  The 3rd screenshot (SJS_JOB_LOG_CONSOLE) is coming from the Spark Job 
> Server log, it says the same job completed at "17:13:55"
> So, SJS was basically holding onto the job for more than 14 minutes and kept 
> it in the running job list although Yarn responded back in time.
> Also, please take a look at the SJS log attached for the time period around 
> when this job was submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16752) Spark Job Server not releasing jobs from "running list" even after yarn completes the job

Reply via email to