GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/2363
[FLINK-4389] Expose metrics to WebFrontend
This PR exposes metrics to the Webfrontend, as proposed in
[FLIP-7](https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface).
This PR builds on-top of #2300, meaning that 2866f56 is not part of the PR.
I've split the implementation into 5 commits that implement
* the generation of a separate scope string for the WebInterface
* the MetricQueryService, a separate actor running on all Job-/TaskManagers
whose main purpose is to create and return a dump of the metrics when queried
to do so
* the MetricStore, a nested data structure used in the WebInterface to
store transmitted metrics
* the MetricFetcher, which is used by the WebInterface to fetch metrics
from Job-/TaskManagers
* various MetricsHandler classes, which handle REST calls requesting
specific metrics
### MetricQueryService
The MetricQueryService is an actor running inside the MetricRegistry acting
like an unscheduled reporter that is queried from the outside for a report. The
MetricRegistry notifies it of added/removed metrics whereas the MetricFetcher
sends report requests to the JM/TM which are then forwarded to the
MetricQueryService, which answers directly to the MetricFetcher.
The report is one big `Object[]`, which contains for each metric
1. the type of the metric, encoded as a byte (so that we know how many
values are transmitted)
2. the fully qualified metric name (based on the separate format)
3. the value(s) of the metric (turned into Strings for Gauges)
### MetricStore
The MetricStore is a relatively simple nested data-structure that contains
one HashMap<String, Object> for every JM/TM/job/task. Received metrics are
added to these HashMaps based on the format string. There is only a single
MetricStore instance in the WebInterface.
### MetricFetcher
The MetricFetcher initiates the transfer and cleanup of metrics. It
contains the MetricStore instance, which is accessed by MetricHandlers. The
fetching is only done when a handler asks for it, with a minimum duration of 10
seconds between updates. As such no fetching will be done if the metrics are
not accessed with REST calls.
The fetching procedure can be summed up in pseudo-code as following:
```
fetch():
askJobManagerForJobDetails()
=> retain all metrics belonging to the given jobs
askJobManagerForMetrics()
=> add received metrics to MetricStore
askJobManagerForRegisteredTaskManagers()
=> retain all metrics belonging to registered task managers
=> for each TaskManager:
askTaskManagerForMetrics()
=> add received metrics to MetricStore
```
### MetricsHandler
The MetricsHandlers deal with two requests:
* getAllAvailableMetrics - any REST request that does not have a `get`
query parameter is treated as a request for all available metrics for a given
JM/TM/job/task, denoted by the REST path. The reply will be a JSON array, for
example: `[{"id":"metric_1"},{"id":"metric_2"}]`
* getMetricValues - the Webfrontend can request the values for several
metrics by passing a comma-separated list of metric id's as the `get` query
parameter. The reply will be a JSON array of id:value pairs, for example:
`[{"id":"metric_1", "value":"4"}]` or an empty string if an error occurred.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 4389_metrics_exposed
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2363.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2363
----
commit ea0e4d892717f042acf26ec9653a2371d7b21028
Author: zentol <[email protected]>
Date: 2016-07-27T09:25:27Z
[FLINK-4245] Expose all defined variables
commit ea1154644566f8009ccda64a0acbdde7d59ad235
Author: zentol <[email protected]>
Date: 2016-08-05T11:54:37Z
Implement Query Scope
Modifies various MetricGroups to return a separate scope for the query
service.
commit 3791a94529d703351dffb284ed3d5d19f1ce272c
Author: zentol <[email protected]>
Date: 2016-08-05T11:49:10Z
Implement MetricQueryService
Used on the JM/TM to create a key-value representation of all metrics.
commit a0e1418decc8a3a4b53da15dc744f1702247db9f
Author: zentol <[email protected]>
Date: 2016-08-05T11:48:06Z
Implement MetricStore
Data structure used in the WebInterface to store the transmitted metrics.
commit 2bab6cc32c139f5969a276e385ed5afd6c6a46ea
Author: zentol <[email protected]>
Date: 2016-08-08T12:52:01Z
Implement MetricFetcher
The MetricFetcher regularly fetches metrics from the JM and all TM's.
commit de4aeaf1e0958b49531adae198345b87ccd260bd
Author: zentol <[email protected]>
Date: 2016-08-05T11:48:22Z
Implement various MetricsHandler
Handlers that answers metric related queries.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---