GitHub user egorklimov opened a pull request:
https://github.com/apache/zeppelin/pull/3165
[ZEPPELIN-3704] Scheduler.getJobsRunning() returns finished jobs
### What is this PR for?
Sometimes, when cron configured with active "After execution stop the
interpreter" setting, last paragraphs marks as ABORT with no reason. I found
out that reason of this behavior is that Scheduler.getJobsRunning() returns
finished jobs. (faced this problem in 0.8, but seems that the same bug could be
in 0.9)
Short log (with additional log info from TinkoffCreditSystems fork):
```
INFO [2018-08-10 00:08:00,000] ({DefaultQuartzScheduler_Worker-47}
Notebook.java[execute]:945) - Start schedule run note: 2C68U586U, cronExpr:"0 8
0 * * ?"
INFO [2018-08-10 00:08:00,047] ({pool-2-thread-266}
SchedulerFactory.java[jobStarted]:109) - Job 20170814-171621_1685490119 started
by scheduler
INFO [2018-08-10 00:10:35,387] ({pool-2-thread-266}
SchedulerFactory.java[jobFinished]:115) - Job 20170814-171621_1685490119
finished by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-greenplum_pd:user:2C68U586U-shared_session
INFO [2018-08-10 00:10:35,417] ({pool-2-thread-3838}
SchedulerFactory.java[jobStarted]:109) - Job 20180402-171122_400058927 started
by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
INFO [2018-08-10 00:11:57,428] ({pool-2-thread-3838}
SchedulerFactory.java[jobFinished]:115) - Job 20180402-171122_400058927
finished by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
INFO [2018-08-10 00:11:57,445] ({pool-2-thread-996}
SchedulerFactory.java[jobStarted]:109) - Job 20180413-191933_1545337614 started
by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
INFO [2018-08-10 00:11:57,527] ({pool-2-thread-996}
NotebookServer.java[afterStatusChange]:2631) - Job 20180413-191933_1545337614
is finished successfully, status: FINISHED
INFO [2018-08-10 00:11:57,547] ({DefaultQuartzScheduler_Worker-47}
Paragraph.java[execute]:343) - skip to run blank paragraph.
20180423-134725_1702290212
INFO [2018-08-10 00:11:57,547] ({DefaultQuartzScheduler_Worker-47}
Notebook.java[execute]:947) - End schedule run note: 2C68U586U
INFO [2018-08-10 00:11:57,548] ({DefaultQuartzScheduler_Worker-47}
ManagedInterpreterGroup.java[close]:100) - Close Session: shared_session for
interpreter setting: spark
INFO [2018-08-10 00:11:57,553] ({pool-2-thread-996}
VFSNotebookRepo.java[save]:196) - Saving note:2C68U586U
Third job status from FINISHED becomes ABORT
WARN [2018-08-10 00:11:57,555] ({DefaultQuartzScheduler_Worker-47}
NotebookServer.java[afterStatusChange]:2633) - Job 20180413-191933_1545337614
is finished, status: ABORT, exception: null, result: %text 'sometext'
INFO [2018-08-10 00:11:57,577] ({pool-2-thread-996}
SchedulerFactory.java[jobFinished]:115) - Job 20180413-191933_1545337614
finished by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
INFO [2018-08-10 00:11:57,585] ({DefaultQuartzScheduler_Worker-47}
ManagedInterpreterGroup.java[close]:130) - Job
paragraph_1523636373190_-1466164905 aborted
```
### What type of PR is it?
Bug Fix
### What is the Jira issue?
* Issue: https://issues.apache.org/jira/browse/ZEPPELIN-3704
### How should this be tested?
* CI failed on 4th test
https://travis-ci.org/TinkoffCreditSystems/zeppelin/builds/421046867 with:
```
Tests in error:
PersonalizeActionsIT.testSimpleAction:152->AbstractZeppelinIT.waitForParagraph:73->AbstractZeppelinIT.pollingWait:99
â¬â Timeout
PersonalizeActionsIT.testGraphAction:194->AbstractZeppelinIT.pollingWait:99
â¬â Timeout
PersonalizeActionsIT.testDynamicFormAction:267->AbstractZeppelinIT.pollingWait:99
â¬â Timeout
Tests run: 29, Failures: 0, Errors: 3, Skipped: 0
```
Seems that smth went wrong with Travis.
* Tested in TinkoffCreditSystems fork, new log:
```
NFO [2018-08-27 04:00:00,001] ({DefaultQuartzScheduler_Worker-30}
Notebook.java[execute]:947) - Start schedule run note: 2DJUZ2HJX, cronExpr:"0 0
0/1 * * ?"
...
INFO [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
Notebook.java[execute]:949) - End schedule run note: 2DJUZ2HJX
INFO [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
ManagedInterpreterGroup.java[close]:100) - Close Session: shared_session for
interpreter setting: spark
ERROR [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
RemoteScheduler.java[getJobsRunning]:138) - Tried to add
paragraph_1532602460612_1917281840 to list of running jobs, but job status is
FINISHED
ERROR [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
RemoteScheduler.java[getJobsRunning]:138) - Tried to add
paragraph_1532602460620_1914203849 to list of running jobs, but job status is
FINISHED
WARN [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is
not opened for org.apache.zeppelin.spark.SparkInterpreter
WARN [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is
not opened for org.apache.zeppelin.spark.SparkSqlInterpreter
WARN [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is
not opened for org.apache.zeppelin.spark.DepInterpreter
WARN [2018-08-27 04:00:11,627] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is
not opened for org.apache.zeppelin.spark.IPySparkInterpreter
WARN [2018-08-27 04:00:11,653] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is
not opened for org.apache.zeppelin.spark.SparkRInterpreter
INFO [2018-08-27 04:00:11,655] ({DefaultQuartzScheduler_Worker-30}
ManagedInterpreterGroup.java[close]:105) - Remove this InterpreterGroup:
spark:user:2DJUZ2HJX as all the sessions are closed
INFO [2018-08-27 04:00:11,655] ({DefaultQuartzScheduler_Worker-30}
ManagedInterpreterGroup.java[close]:108) - Kill RemoteInterpreterProcess
INFO [2018-08-27 04:00:11,661] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreterManagedProcess.java[stop]:220) - Kill interpreter process
WARN [2018-08-27 04:00:14,188] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreterManagedProcess.java[stop]:230) - ignore the exception when
shutting down
INFO [2018-08-27 04:00:14,191] ({DefaultQuartzScheduler_Worker-30}
RemoteInterpreterManagedProcess.java[stop]:238) - Remote process terminated
```
### Screenshots (if appropriate)
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/TinkoffCreditSystems/zeppelin ZEPPELIN-3704
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/3165.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3165
----
commit 40e9ff1411f2e4c4db182522c990e2bee8c5fab2
Author: egorklimov <klim.electronicmail@...>
Date: 2018-08-13T08:50:42Z
RemoteScheduler updated
----
---