GitHub user egorklimov opened a pull request:

    https://github.com/apache/zeppelin/pull/3165

    [ZEPPELIN-3704] Scheduler.getJobsRunning() returns finished jobs

    ### What is this PR for?
    
    Sometimes, when cron configured with active "After execution stop the 
interpreter" setting, last paragraphs marks as ABORT with no reason. I found 
out that reason of this behavior is that Scheduler.getJobsRunning() returns 
finished jobs. (faced this problem in 0.8, but seems that the same bug could be 
in 0.9) 
    Short log (with additional log info from TinkoffCreditSystems fork):
    ```
     INFO [2018-08-10 00:08:00,000] ({DefaultQuartzScheduler_Worker-47} 
Notebook.java[execute]:945) - Start schedule run note: 2C68U586U, cronExpr:"0 8 
0 * * ?"
     INFO [2018-08-10 00:08:00,047] ({pool-2-thread-266} 
SchedulerFactory.java[jobStarted]:109) - Job 20170814-171621_1685490119 started 
by scheduler  
     INFO [2018-08-10 00:10:35,387] ({pool-2-thread-266} 
SchedulerFactory.java[jobFinished]:115) - Job 20170814-171621_1685490119 
finished by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-greenplum_pd:user:2C68U586U-shared_session
     INFO [2018-08-10 00:10:35,417] ({pool-2-thread-3838} 
SchedulerFactory.java[jobStarted]:109) - Job 20180402-171122_400058927 started 
by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
     INFO [2018-08-10 00:11:57,428] ({pool-2-thread-3838} 
SchedulerFactory.java[jobFinished]:115) - Job 20180402-171122_400058927 
finished by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
     INFO [2018-08-10 00:11:57,445] ({pool-2-thread-996} 
SchedulerFactory.java[jobStarted]:109) - Job 20180413-191933_1545337614 started 
by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
     INFO [2018-08-10 00:11:57,527] ({pool-2-thread-996} 
NotebookServer.java[afterStatusChange]:2631) - Job 20180413-191933_1545337614 
is finished successfully, status: FINISHED
     INFO [2018-08-10 00:11:57,547] ({DefaultQuartzScheduler_Worker-47} 
Paragraph.java[execute]:343) - skip to run blank paragraph. 
20180423-134725_1702290212
     INFO [2018-08-10 00:11:57,547] ({DefaultQuartzScheduler_Worker-47} 
Notebook.java[execute]:947) - End schedule run note: 2C68U586U
     INFO [2018-08-10 00:11:57,548] ({DefaultQuartzScheduler_Worker-47} 
ManagedInterpreterGroup.java[close]:100) - Close Session: shared_session for 
interpreter setting: spark
     INFO [2018-08-10 00:11:57,553] ({pool-2-thread-996} 
VFSNotebookRepo.java[save]:196) - Saving note:2C68U586U
    
        Third job status from FINISHED becomes ABORT 
    
     WARN [2018-08-10 00:11:57,555] ({DefaultQuartzScheduler_Worker-47} 
NotebookServer.java[afterStatusChange]:2633) - Job 20180413-191933_1545337614 
is finished, status: ABORT, exception: null, result: %text 'sometext'
     INFO [2018-08-10 00:11:57,577] ({pool-2-thread-996} 
SchedulerFactory.java[jobFinished]:115) - Job 20180413-191933_1545337614 
finished by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
     INFO [2018-08-10 00:11:57,585] ({DefaultQuartzScheduler_Worker-47} 
ManagedInterpreterGroup.java[close]:130) - Job 
paragraph_1523636373190_-1466164905 aborted 
    ```
    
    ### What type of PR is it?
    Bug Fix
    
    ### What is the Jira issue?
    * Issue: https://issues.apache.org/jira/browse/ZEPPELIN-3704
    
    ### How should this be tested?
    * CI failed on 4th test 
https://travis-ci.org/TinkoffCreditSystems/zeppelin/builds/421046867 with:
    ```
    Tests in error: 
      
PersonalizeActionsIT.testSimpleAction:152->AbstractZeppelinIT.waitForParagraph:73->AbstractZeppelinIT.pollingWait:99
 ┬╗ Timeout
      
PersonalizeActionsIT.testGraphAction:194->AbstractZeppelinIT.pollingWait:99 
┬╗ Timeout
      
PersonalizeActionsIT.testDynamicFormAction:267->AbstractZeppelinIT.pollingWait:99
 ┬╗ Timeout
    
    Tests run: 29, Failures: 0, Errors: 3, Skipped: 0
    ```
    Seems that smth went wrong with Travis.
    * Tested in TinkoffCreditSystems fork, new log:
    ```
    NFO [2018-08-27 04:00:00,001] ({DefaultQuartzScheduler_Worker-30} 
Notebook.java[execute]:947) - Start schedule run note: 2DJUZ2HJX, cronExpr:"0 0 
0/1 * * ?"
     ...
     INFO [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
Notebook.java[execute]:949) - End schedule run note: 2DJUZ2HJX
     INFO [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
ManagedInterpreterGroup.java[close]:100) - Close Session: shared_session for 
interpreter setting: spark
    ERROR [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
RemoteScheduler.java[getJobsRunning]:138) - Tried to add 
paragraph_1532602460612_1917281840 to list of running jobs, but job status is 
FINISHED
    ERROR [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
RemoteScheduler.java[getJobsRunning]:138) - Tried to add 
paragraph_1532602460620_1914203849 to list of running jobs, but job status is 
FINISHED
     WARN [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is 
not opened for org.apache.zeppelin.spark.SparkInterpreter
     WARN [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is 
not opened for org.apache.zeppelin.spark.SparkSqlInterpreter
     WARN [2018-08-27 04:00:11,619] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is 
not opened for org.apache.zeppelin.spark.DepInterpreter
     WARN [2018-08-27 04:00:11,627] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is 
not opened for org.apache.zeppelin.spark.IPySparkInterpreter
     WARN [2018-08-27 04:00:11,653] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is 
not opened for org.apache.zeppelin.spark.SparkRInterpreter
     INFO [2018-08-27 04:00:11,655] ({DefaultQuartzScheduler_Worker-30} 
ManagedInterpreterGroup.java[close]:105) - Remove this InterpreterGroup: 
spark:user:2DJUZ2HJX as all the sessions are closed
     INFO [2018-08-27 04:00:11,655] ({DefaultQuartzScheduler_Worker-30} 
ManagedInterpreterGroup.java[close]:108) - Kill RemoteInterpreterProcess
     INFO [2018-08-27 04:00:11,661] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreterManagedProcess.java[stop]:220) - Kill interpreter process
     WARN [2018-08-27 04:00:14,188] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreterManagedProcess.java[stop]:230) - ignore the exception when 
shutting down
     INFO [2018-08-27 04:00:14,191] ({DefaultQuartzScheduler_Worker-30} 
RemoteInterpreterManagedProcess.java[stop]:238) - Remote process terminated
    ```
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/TinkoffCreditSystems/zeppelin ZEPPELIN-3704

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/3165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3165
    
----
commit 40e9ff1411f2e4c4db182522c990e2bee8c5fab2
Author: egorklimov <klim.electronicmail@...>
Date:   2018-08-13T08:50:42Z

    RemoteScheduler updated

----


---

Reply via email to