Repository: zeppelin
Updated Branches:
  refs/heads/master 4bf0f3a91 -> 888a05d1e


[ZEPPELIN-3077] Cron scheduler is easy to get stuck when one of the cron jobs 
takes long time or gets stuck

### What is this PR for?
The cron scheduler is easy to get stuck when one of the cron jobs takes long 
time or gets stuck.

I sometimes come across the issue that the cron scheduler stops working 
suddenly. According to the thread dump of ZeppelinServer, all of the 
DefaultQuartzScheduler_Worker threads were waiting for the job's completion and 
there was no thread to launch a new job.

Here is the contents of the thread dump:

```
"DefaultQuartzScheduler_Worker-10" #76 prio=5 os_prio=0 tid=0x00007fb41d3b4000 
nid=0x1b521 sleeping[0x00007fb3daef1000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
        - locked <0x00000000c0a7dbf0> (a java.lang.Object)

   Locked ownable synchronizers:
        - None

"DefaultQuartzScheduler_Worker-9" #75 prio=5 os_prio=0 tid=0x00007fb41d3b2000 
nid=0x1b520 waiting on condition [0x00007fb3daff2000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
        - locked <0x00000000c0a7a470> (a java.lang.Object)

   Locked ownable synchronizers:
        - None

...

"DefaultQuartzScheduler_Worker-2" #68 prio=5 os_prio=0 tid=0x00007fb41d3c8800 
nid=0x1b519 waiting on condition [0x00007fb3da473000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
        - locked <0x00000000c0a7a7b0> (a java.lang.Object)

   Locked ownable synchronizers:
        - None

"DefaultQuartzScheduler_Worker-1" #67 prio=5 os_prio=0 tid=0x00007fb41d3cc800 
nid=0x1b518 waiting on condition [0x00007fb3da372000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
        - locked <0x00000000c0a7dd90> (a java.lang.Object)

   Locked ownable synchronizers:
        - None
```

The above thread dump says that all of the worker threads get stuck at 
https://github.com/apache/zeppelin/blob/v0.7.3/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java#L889.

One way to reproduce this kind of issue is creating a paragraph whose status is 
"READY" and "disable run". That makes the paragraph status "READY" permanently 
and `note.isTerminated()` never turns to `true`.

To fix this issue, the following two improvements has been made at this PR:

1. Remove the unnecessary `while (!note.isTerminated()) { ... }` block because 
the execution of all of the paragraphs is finished after `note.runAll()`.
2. Skip the cron execution if there is a running or pending paragraph. That 
prevents the Zeppelin cron scheduler from getting stuck by the long running 
paragraph whose execution duration is greater than the cron execution cycle.

### What type of PR is it?
[Bug]

### Todos

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-3077

### How should this be tested?
* Tested manually.
    1. The cron scheduler does not get stuck if there is a paragraph whose 
status is "READY" and "disable run".
    2. The following message is printed on the log file when the cron job is 
launched while the previous cron job still has been running.
        * `execution of the cron job is skipped because there is a running or 
pending paragraph (note id: XXXXXXXXX)`

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No.
* Is there breaking changes for older versions? No.
* Does this needs documentation? Yes. The behavior of the cron job was changed 
not to run if there is a running or pending paragraph by this PR. Thus, the 
documentation `docs/usage/other_features/cron_scheduler.md` was also added by 
this PR. Its layout is as follow:

<img width="711" alt="screen shot 2017-11-28 at 18 30 54" 
src="https://user-images.githubusercontent.com/31149688/33312407-20664e02-d46b-11e7-9715-9e2562d5e064.png";>

Author: Keiji Yoshida <[email protected]>

Closes #2687 from kjmrknsn/ZEPPELIN-3077 and squashes the following commits:

81e72188d [Keiji Yoshida] [ZEPPELIN-3077] Cron scheduler is easy to get stuck 
when one of the cron jobs takes long time or gets stuck


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/888a05d1
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/888a05d1
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/888a05d1

Branch: refs/heads/master
Commit: 888a05d1e48a75f237edf7b6bfa7d4e57b100548
Parents: 4bf0f3a
Author: Keiji Yoshida <[email protected]>
Authored: Mon Nov 27 19:48:47 2017 +0900
Committer: Lee moon soo <[email protected]>
Committed: Tue Dec 19 16:05:45 2017 -0800

----------------------------------------------------------------------
 docs/_includes/themes/zeppelin/_navigation.html |   1 +
 .../img/docs-img/cron_scheduler_dialog_box.png  | Bin 0 -> 135264 bytes
 docs/usage/other_features/cron_scheduler.md     |  52 +++++++++++++++++++
 .../java/org/apache/zeppelin/notebook/Note.java |  16 ++++++
 .../org/apache/zeppelin/notebook/Notebook.java  |  13 +++--
 .../apache/zeppelin/notebook/NotebookTest.java  |  40 ++++++++++++++
 6 files changed, 115 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/docs/_includes/themes/zeppelin/_navigation.html
----------------------------------------------------------------------
diff --git a/docs/_includes/themes/zeppelin/_navigation.html 
b/docs/_includes/themes/zeppelin/_navigation.html
index 95d83ea..796b8bc 100644
--- a/docs/_includes/themes/zeppelin/_navigation.html
+++ b/docs/_includes/themes/zeppelin/_navigation.html
@@ -62,6 +62,7 @@
                 <li><a 
href="{{BASE_PATH}}/usage/other_features/personalized_mode.html">Personalized 
Mode</a></li>
                 <li><a 
href="{{BASE_PATH}}/usage/other_features/customizing_homepage.html">Customizing 
Zeppelin Homepage</a></li>
                 <li><a 
href="{{BASE_PATH}}/usage/other_features/notebook_actions.html">Notebook 
Actions</a></li>
+                <li><a 
href="{{BASE_PATH}}/usage/other_features/cron_scheduler.html">Cron 
Scheduler</a></li>
                 <li role="separator" class="divider"></li>
                 <li class="title"><span>REST API</span></li>
                 <li><a 
href="{{BASE_PATH}}/usage/rest_api/interpreter.html">Interpreter API</a></li>

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png
----------------------------------------------------------------------
diff --git 
a/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png 
b/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png
new file mode 100644
index 0000000..5063398
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png differ

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/docs/usage/other_features/cron_scheduler.md
----------------------------------------------------------------------
diff --git a/docs/usage/other_features/cron_scheduler.md 
b/docs/usage/other_features/cron_scheduler.md
new file mode 100644
index 0000000..e8a9975
--- /dev/null
+++ b/docs/usage/other_features/cron_scheduler.md
@@ -0,0 +1,52 @@
+---
+layout: page
+title: "Running a Notebook on a Given Schedule Automatically"
+description: "You can run a notebook on a given schedule automatically by 
setting up a cron scheduler on the notebook."
+group: usage/other_features
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+{% include JB/setup %}
+
+# Running a Notebook on a Given Schedule Automatically
+
+<div id="toc"></div>
+
+Apache Zeppelin provides a cron scheduler for each notebook. You can run a 
notebook on a given schedule automatically by setting up a cron scheduler on 
the notebook.
+
+## Setting up a cron scheduler on a notebook
+
+Click the clock icon on the tool bar and open a cron scheduler dialog box.
+
+<img 
src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png"
 />
+
+There are the following items which you can input or set:
+
+### Preset
+
+You can set a cron schedule easily by clicking each option such as `1m` and 
`5m`. The login user is set as a cron executing user automatically. You can 
also clear the cron schedule settings by clicking `None`.
+
+### Cron expression
+
+You can set the cron schedule by filling in this form. Please see [Cron 
Trigger 
Tutorial](http://www.quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/crontrigger)
 for the available cron syntax.
+
+### Cron executing user
+
+You can set the cron executing user by filling in this form and press the 
enter key.
+
+### auto-restart interpreter on cron execution
+
+When this checkbox is set to "on", the interpreters which are binded to the 
notebook are stopped automatically after the cron execution. This feature is 
useful if you want to release the interpreter resources after the cron 
execution.
+
+> **Note**: A cron execution is skipped if one of the paragraphs is in a state 
of `RUNNING` or `PENDING` no matter whether it is executed automatically (i.e. 
by the cron scheduler) or manually by a user opening this notebook.

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
----------------------------------------------------------------------
diff --git 
a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java 
b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
index 6e66732..19f396e 100644
--- a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
+++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
@@ -652,6 +652,22 @@ public class Note implements ParagraphJobListener, 
JsonSerializable {
     return true;
   }
 
+  /**
+   * Return true if there is a running or pending paragraph
+   */
+  boolean isRunningOrPending() {
+    synchronized (paragraphs) {
+      for (Paragraph p : paragraphs) {
+        Status status = p.getStatus();
+        if (status.isRunning() || status.isPending()) {
+          return true;
+        }
+      }
+    }
+
+    return false;
+  }
+
   public boolean isTrash() {
     String path = getName();
     if (path.charAt(0) == '/') {

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
----------------------------------------------------------------------
diff --git 
a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java 
b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
index 8de981e..d68cd4b 100644
--- a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
+++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
@@ -890,16 +890,15 @@ public class Notebook implements NoteEventListener {
 
       String noteId = 
context.getJobDetail().getJobDataMap().getString("noteId");
       Note note = notebook.getNote(noteId);
-      note.runAll();
 
-      while (!note.isTerminated()) {
-        try {
-          Thread.sleep(1000);
-        } catch (InterruptedException e) {
-          logger.error(e.toString(), e);
-        }
+      if (note.isRunningOrPending()) {
+        logger.warn("execution of the cron job is skipped because there is a 
running or pending " +
+            "paragraph (note id: {})", noteId);
+        return;
       }
 
+      note.runAll();
+
       boolean releaseResource = false;
       String cronExecutingUser = null;
       try {

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
----------------------------------------------------------------------
diff --git 
a/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java 
b/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
index ba9e177..83c0932 100644
--- 
a/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
+++ 
b/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
@@ -362,6 +362,46 @@ public class NotebookTest extends AbstractInterpreterTest 
implements JobListener
   }
 
   @Test
+  public void testScheduleAgainstRunningAndPendingParagraph() throws 
InterruptedException, IOException {
+    // create a note
+    Note note = notebook.createNote(anonymous);
+    interpreterSettingManager.setInterpreterBinding("user", note.getId(),
+            interpreterSettingManager.getInterpreterSettingIds());
+
+    // append running and pending paragraphs to the note
+    for (Status status: new Status[]{Status.RUNNING, Status.PENDING}) {
+      Paragraph p = note.addNewParagraph(AuthenticationInfo.ANONYMOUS);
+      Map config = new HashMap<>();
+      p.setConfig(config);
+      p.setText("p");
+      p.setStatus(status);
+      assertNull(p.getDateFinished());
+    }
+
+    // set cron scheduler, once a second
+    Map config = note.getConfig();
+    config.put("enabled", true);
+    config.put("cron", "* * * * * ?");
+    note.setConfig(config);
+    notebook.refreshCron(note.getId());
+    Thread.sleep(2 * 1000);
+
+    // remove cron scheduler.
+    config.put("cron", null);
+    note.setConfig(config);
+    notebook.refreshCron(note.getId());
+    Thread.sleep(2 * 1000);
+
+    // check if the executions of the running and pending paragraphs were 
skipped
+    for (Paragraph p : note.paragraphs) {
+      assertNull(p.getDateFinished());
+    }
+
+    // remove the note
+    notebook.removeNote(note.getId(), anonymous);
+  }
+
+  @Test
   public void testSchedulePoolUsage() throws InterruptedException, IOException 
{
     final int timeout = 30;
     final String everySecondCron = "* * * * * ?";

Reply via email to