GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/1947
[FLINK-1502] Basic Metric System
This PR is a preview of the new metric system.
It is not complete because
* there is no documentation for the website
* a few smaller parts also don't have code documentation
* I haven't tried out the ganglia/statsD reporter yet
In general though it works and it is now time to gather some feedback.
The PR is organized into several commits to give it some structure;
generally divided by which part of the system they expose the metric system to.
Note that The last commit "Metric Usage Examples" is not technically part of
the PR but showcases the usage.
The division was done very simple, so some changes may technically belong
to several commits.
## General overview
A user can access a system-provided MetricGroup to register a Metric, which
is stored in a MetricRegistry and forwarded regularly to a Reporter which
communicates them to an external system.
## MetricGroups
MetricGroups are the user-facing part of the system. They are a nested data
structure, containing other groups and metrics, that allow registering metrics
with Flink while organizing them in a hierarchy.
For example, every TaskManager has a MetricGroup, and for every task that
is deployed a new sub-group for that task is added. This task specific group is
propagated through the task stack, with new groups/metrics being added. Within
a UDF the operator MetricGroup is accessed through the RuntimeContext.
## Metrics
Metrics are the objects used to measure something.
Metrics include
* Gauges, that measure a value on-demand
* Meters, that measure the rate/count of events
* Histograms, that measure the distribution of long values
* Counters, that count stuff
* Timers, that measure rate of calls and distribution of execution time for
a given piece of code.
Under the hood we use the Metrics from the Dropwizard library. In order to
ensure interface stability, and to give us the option to reimplement things
without breaking everything, they (and other classes) are wrapped to match our
interfaces.
## Reporters
Reporters are the component that communicate the Metrics to the outside
world. With this PR we allow exporting Metrics via JMX (default), Graphite,
Ganglia and StatsD. They interval in which they report is configurable.
Similarly to Metrics, we partially use reporters from the DropWizard
library (Graphite, Ganglia), again wrapped to match out interfaces.
Reporters are configured via flink-conf.yaml.
An example configuration might look like this:
metrics.reporter.class: org.apache.flink.metrics.GraphiteReporter
metrics.reporter.arguments: --host localhost --port 8080
metrics.reporter.interval: 30 SECONDS
Reporters are instantiated generically and configured with a Configuration
containing the parsed arguments. All non-JMXReporters are not part of the
distribution and have to be added to the classpath manually (usually by putting
the jar into /lib)
JMX uses the port 9010 by default, This can be configured by setting the
metrics.jmx.port property in the flink-conf.yaml
## Registry
The registry is essentially just a connection between all MetricGroups and
the Reporter.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink metrics_v2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1947.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1947
----
commit b90b53cd73824389b41978f0113ca0c6d3da1422
Author: zentol <[email protected]>
Date: 2016-04-15T13:57:14Z
Add basic metric structures
-add dropwizard dependency to flink-core
-add metric wrappers
-add metric groups/category organization
-add metric registry
commit 45e6e123d37a8fba1bf76386a84436e8fb04a9fa
Author: zentol <[email protected]>
Date: 2016-04-19T11:28:28Z
Graphite/Ganglia/StatsD Reporters
commit e634060d83f2b475e954c67424ba39e3ffd92b6b
Author: zentol <[email protected]>
Date: 2016-04-13T16:47:04Z
Task Integration
-included job name in TaskDeploymentDescriptor
-enabled remote JMX for TaskManager
-added TaskManager status metrics
commit 20ca6c3b19690e08335e31fcf3377f4a511e9b00
Author: zentol <[email protected]>
Date: 2016-04-13T14:50:16Z
Environment Integration
-add MetricGroup field to environment
-primary location to retrieve tm/task/subtask keyed metricgroup
commit e8eed4d27361ea311dbf9e9694cca70633d5b54e
Author: zentol <[email protected]>
Date: 2016-04-13T14:23:54Z
IO Metrics Integration
-add metrics for records/bytes read/written
commit f47161db1804909f46520844d23a4e3148387f7b
Author: zentol <[email protected]>
Date: 2016-04-14T10:02:51Z
Streaming Operator Integration
commit c0c2d967dd53ceac966af4b7400982de5e53a272
Author: zentol <[email protected]>
Date: 2016-04-13T15:17:15Z
Batch Operator Integration
-add getMetricGroup() method to TaskContext for driver access
-add MetricGroup field to ChainedDriver for chained driver access
commit fa7a8947bde42333748ae02d7c02023f89d20e41
Author: zentol <[email protected]>
Date: 2016-04-13T14:51:46Z
Context Integration
-add getMetricGroup() method to udf-context for udf/IO-format access
commit 9082d0697ad7f5c9146d77c932eb551eabba40ac
Author: zentol <[email protected]>
Date: 2016-04-13T14:58:38Z
Metric Usage Examples
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---