Repository: zeppelin
Updated Branches:
refs/heads/master 4bf0f3a91 -> 888a05d1e
[ZEPPELIN-3077] Cron scheduler is easy to get stuck when one of the cron jobs
takes long time or gets stuck
### What is this PR for?
The cron scheduler is easy to get stuck when one of the cron jobs takes long
time or gets stuck.
I sometimes come across the issue that the cron scheduler stops working
suddenly. According to the thread dump of ZeppelinServer, all of the
DefaultQuartzScheduler_Worker threads were waiting for the job's completion and
there was no thread to launch a new job.
Here is the contents of the thread dump:
```
"DefaultQuartzScheduler_Worker-10" #76 prio=5 os_prio=0 tid=0x00007fb41d3b4000
nid=0x1b521 sleeping[0x00007fb3daef1000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
- locked <0x00000000c0a7dbf0> (a java.lang.Object)
Locked ownable synchronizers:
- None
"DefaultQuartzScheduler_Worker-9" #75 prio=5 os_prio=0 tid=0x00007fb41d3b2000
nid=0x1b520 waiting on condition [0x00007fb3daff2000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
- locked <0x00000000c0a7a470> (a java.lang.Object)
Locked ownable synchronizers:
- None
...
"DefaultQuartzScheduler_Worker-2" #68 prio=5 os_prio=0 tid=0x00007fb41d3c8800
nid=0x1b519 waiting on condition [0x00007fb3da473000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
- locked <0x00000000c0a7a7b0> (a java.lang.Object)
Locked ownable synchronizers:
- None
"DefaultQuartzScheduler_Worker-1" #67 prio=5 os_prio=0 tid=0x00007fb41d3cc800
nid=0x1b518 waiting on condition [0x00007fb3da372000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:889)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
- locked <0x00000000c0a7dd90> (a java.lang.Object)
Locked ownable synchronizers:
- None
```
The above thread dump says that all of the worker threads get stuck at
https://github.com/apache/zeppelin/blob/v0.7.3/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java#L889.
One way to reproduce this kind of issue is creating a paragraph whose status is
"READY" and "disable run". That makes the paragraph status "READY" permanently
and `note.isTerminated()` never turns to `true`.
To fix this issue, the following two improvements has been made at this PR:
1. Remove the unnecessary `while (!note.isTerminated()) { ... }` block because
the execution of all of the paragraphs is finished after `note.runAll()`.
2. Skip the cron execution if there is a running or pending paragraph. That
prevents the Zeppelin cron scheduler from getting stuck by the long running
paragraph whose execution duration is greater than the cron execution cycle.
### What type of PR is it?
[Bug]
### Todos
### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-3077
### How should this be tested?
* Tested manually.
1. The cron scheduler does not get stuck if there is a paragraph whose
status is "READY" and "disable run".
2. The following message is printed on the log file when the cron job is
launched while the previous cron job still has been running.
* `execution of the cron job is skipped because there is a running or
pending paragraph (note id: XXXXXXXXX)`
### Screenshots (if appropriate)
### Questions:
* Does the licenses files need update? No.
* Is there breaking changes for older versions? No.
* Does this needs documentation? Yes. The behavior of the cron job was changed
not to run if there is a running or pending paragraph by this PR. Thus, the
documentation `docs/usage/other_features/cron_scheduler.md` was also added by
this PR. Its layout is as follow:
<img width="711" alt="screen shot 2017-11-28 at 18 30 54"
src="https://user-images.githubusercontent.com/31149688/33312407-20664e02-d46b-11e7-9715-9e2562d5e064.png">
Author: Keiji Yoshida <[email protected]>
Closes #2687 from kjmrknsn/ZEPPELIN-3077 and squashes the following commits:
81e72188d [Keiji Yoshida] [ZEPPELIN-3077] Cron scheduler is easy to get stuck
when one of the cron jobs takes long time or gets stuck
Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/888a05d1
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/888a05d1
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/888a05d1
Branch: refs/heads/master
Commit: 888a05d1e48a75f237edf7b6bfa7d4e57b100548
Parents: 4bf0f3a
Author: Keiji Yoshida <[email protected]>
Authored: Mon Nov 27 19:48:47 2017 +0900
Committer: Lee moon soo <[email protected]>
Committed: Tue Dec 19 16:05:45 2017 -0800
----------------------------------------------------------------------
docs/_includes/themes/zeppelin/_navigation.html | 1 +
.../img/docs-img/cron_scheduler_dialog_box.png | Bin 0 -> 135264 bytes
docs/usage/other_features/cron_scheduler.md | 52 +++++++++++++++++++
.../java/org/apache/zeppelin/notebook/Note.java | 16 ++++++
.../org/apache/zeppelin/notebook/Notebook.java | 13 +++--
.../apache/zeppelin/notebook/NotebookTest.java | 40 ++++++++++++++
6 files changed, 115 insertions(+), 7 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/docs/_includes/themes/zeppelin/_navigation.html
----------------------------------------------------------------------
diff --git a/docs/_includes/themes/zeppelin/_navigation.html
b/docs/_includes/themes/zeppelin/_navigation.html
index 95d83ea..796b8bc 100644
--- a/docs/_includes/themes/zeppelin/_navigation.html
+++ b/docs/_includes/themes/zeppelin/_navigation.html
@@ -62,6 +62,7 @@
<li><a
href="{{BASE_PATH}}/usage/other_features/personalized_mode.html">Personalized
Mode</a></li>
<li><a
href="{{BASE_PATH}}/usage/other_features/customizing_homepage.html">Customizing
Zeppelin Homepage</a></li>
<li><a
href="{{BASE_PATH}}/usage/other_features/notebook_actions.html">Notebook
Actions</a></li>
+ <li><a
href="{{BASE_PATH}}/usage/other_features/cron_scheduler.html">Cron
Scheduler</a></li>
<li role="separator" class="divider"></li>
<li class="title"><span>REST API</span></li>
<li><a
href="{{BASE_PATH}}/usage/rest_api/interpreter.html">Interpreter API</a></li>
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png
----------------------------------------------------------------------
diff --git
a/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png
b/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png
new file mode 100644
index 0000000..5063398
Binary files /dev/null and
b/docs/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png differ
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/docs/usage/other_features/cron_scheduler.md
----------------------------------------------------------------------
diff --git a/docs/usage/other_features/cron_scheduler.md
b/docs/usage/other_features/cron_scheduler.md
new file mode 100644
index 0000000..e8a9975
--- /dev/null
+++ b/docs/usage/other_features/cron_scheduler.md
@@ -0,0 +1,52 @@
+---
+layout: page
+title: "Running a Notebook on a Given Schedule Automatically"
+description: "You can run a notebook on a given schedule automatically by
setting up a cron scheduler on the notebook."
+group: usage/other_features
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+{% include JB/setup %}
+
+# Running a Notebook on a Given Schedule Automatically
+
+<div id="toc"></div>
+
+Apache Zeppelin provides a cron scheduler for each notebook. You can run a
notebook on a given schedule automatically by setting up a cron scheduler on
the notebook.
+
+## Setting up a cron scheduler on a notebook
+
+Click the clock icon on the tool bar and open a cron scheduler dialog box.
+
+<img
src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/cron_scheduler_dialog_box.png"
/>
+
+There are the following items which you can input or set:
+
+### Preset
+
+You can set a cron schedule easily by clicking each option such as `1m` and
`5m`. The login user is set as a cron executing user automatically. You can
also clear the cron schedule settings by clicking `None`.
+
+### Cron expression
+
+You can set the cron schedule by filling in this form. Please see [Cron
Trigger
Tutorial](http://www.quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/crontrigger)
for the available cron syntax.
+
+### Cron executing user
+
+You can set the cron executing user by filling in this form and press the
enter key.
+
+### auto-restart interpreter on cron execution
+
+When this checkbox is set to "on", the interpreters which are binded to the
notebook are stopped automatically after the cron execution. This feature is
useful if you want to release the interpreter resources after the cron
execution.
+
+> **Note**: A cron execution is skipped if one of the paragraphs is in a state
of `RUNNING` or `PENDING` no matter whether it is executed automatically (i.e.
by the cron scheduler) or manually by a user opening this notebook.
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
----------------------------------------------------------------------
diff --git
a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
index 6e66732..19f396e 100644
--- a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
+++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Note.java
@@ -652,6 +652,22 @@ public class Note implements ParagraphJobListener,
JsonSerializable {
return true;
}
+ /**
+ * Return true if there is a running or pending paragraph
+ */
+ boolean isRunningOrPending() {
+ synchronized (paragraphs) {
+ for (Paragraph p : paragraphs) {
+ Status status = p.getStatus();
+ if (status.isRunning() || status.isPending()) {
+ return true;
+ }
+ }
+ }
+
+ return false;
+ }
+
public boolean isTrash() {
String path = getName();
if (path.charAt(0) == '/') {
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
----------------------------------------------------------------------
diff --git
a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
index 8de981e..d68cd4b 100644
--- a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
+++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java
@@ -890,16 +890,15 @@ public class Notebook implements NoteEventListener {
String noteId =
context.getJobDetail().getJobDataMap().getString("noteId");
Note note = notebook.getNote(noteId);
- note.runAll();
- while (!note.isTerminated()) {
- try {
- Thread.sleep(1000);
- } catch (InterruptedException e) {
- logger.error(e.toString(), e);
- }
+ if (note.isRunningOrPending()) {
+ logger.warn("execution of the cron job is skipped because there is a
running or pending " +
+ "paragraph (note id: {})", noteId);
+ return;
}
+ note.runAll();
+
boolean releaseResource = false;
String cronExecutingUser = null;
try {
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/888a05d1/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
----------------------------------------------------------------------
diff --git
a/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
b/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
index ba9e177..83c0932 100644
---
a/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
+++
b/zeppelin-zengine/src/test/java/org/apache/zeppelin/notebook/NotebookTest.java
@@ -362,6 +362,46 @@ public class NotebookTest extends AbstractInterpreterTest
implements JobListener
}
@Test
+ public void testScheduleAgainstRunningAndPendingParagraph() throws
InterruptedException, IOException {
+ // create a note
+ Note note = notebook.createNote(anonymous);
+ interpreterSettingManager.setInterpreterBinding("user", note.getId(),
+ interpreterSettingManager.getInterpreterSettingIds());
+
+ // append running and pending paragraphs to the note
+ for (Status status: new Status[]{Status.RUNNING, Status.PENDING}) {
+ Paragraph p = note.addNewParagraph(AuthenticationInfo.ANONYMOUS);
+ Map config = new HashMap<>();
+ p.setConfig(config);
+ p.setText("p");
+ p.setStatus(status);
+ assertNull(p.getDateFinished());
+ }
+
+ // set cron scheduler, once a second
+ Map config = note.getConfig();
+ config.put("enabled", true);
+ config.put("cron", "* * * * * ?");
+ note.setConfig(config);
+ notebook.refreshCron(note.getId());
+ Thread.sleep(2 * 1000);
+
+ // remove cron scheduler.
+ config.put("cron", null);
+ note.setConfig(config);
+ notebook.refreshCron(note.getId());
+ Thread.sleep(2 * 1000);
+
+ // check if the executions of the running and pending paragraphs were
skipped
+ for (Paragraph p : note.paragraphs) {
+ assertNull(p.getDateFinished());
+ }
+
+ // remove the note
+ notebook.removeNote(note.getId(), anonymous);
+ }
+
+ @Test
public void testSchedulePoolUsage() throws InterruptedException, IOException
{
final int timeout = 30;
final String everySecondCron = "* * * * * ?";