GitHub user egorklimov opened a pull request:
https://github.com/apache/zeppelin/pull/3110
[ZEPPELIN-3671] Add info about running interpreters PIDs to API and JMX
### What is this PR for?
This is a continuation of
[3102](https://github.com/apache/zeppelin/pull/3102)
It would be nice if we could get PID of running interpreter, and group of
paragraphs that running under that interpreter, using API and JMX.
Using this feature it will be easy to check CPU and memory load, etc.
This PR adds:
* API method to get info about running interpreters
(`/api/interpreter/running`);
* API method to get info about running paragraphs grouped by interpreters
(`/api/notebook/jobmanager/running`);
* Few JMX methods which do the same as API;
* Template for simple running paragraphs analysis using API.
Part of discussion from previous PR:
> author: @zjffdu
> Thanks @egorklimov , this is an interesting feature.
> This assume all the interpreter process will generate pid file, but this
assumption is not true. There's 2 exceptions at least for now. One is
yarn-cluster mode of spark where the interpreter runs in remote node of yarn
cluster. Another is running interpreter in container which is on our roadmap.
Do you have any ideas of how to handle these 2 scenarios ?
> author: @egorklimov
> If I'm not mistaken according to `bin/interpreter.sh:234-238`:
> ```
> if [[ -z "${pid}" ]]; then
> exit 1;
> else
> echo ${pid} > ${ZEPPELIN_PID}
> fi
> ```
> Pid file generates every time (except case when interpreter process
didn't start successfully). But in yarn scenario it will be hard to analyze CPU
and memory load because resources will be consumed on each node, maybe someone
will add one more template for that case.
>
> In case of container, I suppose we could generate other info in run
folder.
### What type of PR is it?
Improvement
### What is the Jira issue?
Issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN-3671
### How should this be tested?
* CI failed on [third
test](https://travis-ci.org/TinkoffCreditSystems/zeppelin/jobs/411783078),
because of `org.apache.zeppelin.rest.ZeppelinRestApiTest.testJobs()` with job
time limit error;
* Tests added.
### Screenshots
Example of tables built on response data:
* Interpreters:

* Paragraphs:

Tree of processes associated with running:

The same data using JMX (viewed in jconsole):
* Running interpreters:

* Running paragraphs

### Questions:
* Does the licenses files need update? Yes, Common Public License Version
1.0 added
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes, info about API should be updated
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/TinkoffCreditSystems/zeppelin DW-17571
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/3110.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3110
----
commit ede27cf5ce9583b3240ad87ac999d8151d450e66
Author: egorklimov <klim.electronicmail@...>
Date: 2018-07-24T13:28:43Z
MBean register fixed
commit b09982228b58ebbf9a3ce2ce16f4a6bd2da9622c
Author: egorklimov <klim.electronicmail@...>
Date: 2018-07-24T16:22:20Z
Running statistics functions added
Bugs fixed
commit adde2e3e62ff9da9fd72c450535caf464417f1c4
Author: Egor Klimov <klimovgeor@...>
Date: 2018-07-30T12:00:10Z
Add templates
commit 7eb26f67de2ee7ad9a27e0e5d1a6adff5d11a7a4
Author: egorklimov <klim.electronicmail@...>
Date: 2018-07-30T15:54:15Z
Tests added
----
---