GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/2616
[FLINK-4733] Port WebInterface to metric system
# This PR relies on #2613, #2614 and #2615. Thus, the first 5 commits
should not be reviewed here.
This PR ports the remaining parts of the WebInterface to rely on the metric
system.
# TaskManager metrics
In a7011e8305d7c828fabc4245358c2d21568fd561 the TaskManagersHandler is
modified to use the metric system. In addition, the garbage collector section
in the WebInterface was enhanced to no longer rely on hard-coded GC names, but
instead be dynamic. The recently introduced network metrics have been added as
well.
cbff6d6aab80bc423a09aa6b62c80a2f409d796a then removes the remnants of the
old metrics that are now unused. This affects the TaskManager(no longer gathers
these metrics) and Heartbeat messages (no longer includes a metrics report). As
a result the DropWizard dependency was removed. The transitive jackson
dependency is now explicitly set for both flink-runtime and flink-runtime-web.
# Task metrics
The Webinterface shows how many records/bytes each task has received or
sent. Until now these were gathered with system specific accumulators.
cab25496ff5991de60e757f68c5d5139c86f34ba these accumulators were removed.
Under the new system, bytes In/Out is measured per task (since it doesn't
make sense within chained operators), while records In/Out is measured per
operator. In order to display the records metrics for each task it was thus
necessary to "reuse" some operator counters for the task.
This is implemented in 16983485198a61bec0418adb833508dcaf276170 by
re-registering the numRecordsIn counter of the first operator in the chain and
the the numRecordsOut counter of the last operator on the task level
This re-use could (sadly) not be done automatically within the metric
system. Instead 2 helper methods were added to the OperatorIOMetricGroup, which
are called for example within BatchTask#invoke(), which forward the counters to
the TaskIOMetricGroup where they are stored and re-registered.
With these metrics being re-registered they can be accessed easily via the
MetricQueryService from the WebInterface handlers. The downside is that this
service provides no guarantee that the most up-to-date metrics for a finished
task will be transferred. It was thus necessary to store a snapshot of these
IOMetrics within the ExecutionGraph, similar to the system accumulators, which
the handlers could access as well.
The handlers were finally adjusted in
8be5145a9406dc8d6d661299c9ee98aa09233df4. For running tasks they access metrics
via the MetricQueryService, whereas for finished tasks they rely on the metrics
stored in the ExecutionGraph.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 4733_metrics_port
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2616.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2616
----
commit 5f0f3598fa5d0fdf8b61d591e2bb94b74924ee0d
Author: zentol <[email protected]>
Date: 2016-10-07T11:02:10Z
[FLINK-4773] [metrics] [refactor] Rename IOMetricGroup to TaskIOMetricGroup
commit df40a58c74e7f0fc3feec4a5848f1627bf4537dd
Author: zentol <[email protected]>
Date: 2016-10-05T13:04:03Z
[FLINK-4773] [metrics] [refactor] Introduce OperatorIOMetricGroup
commit 2685f6a908a0ce4cc9fe3d97beca005ea3d59ee5
Author: zentol <[email protected]>
Date: 2016-10-07T08:11:31Z
[FLINK-4772] [metrics] Store metrics as strings in MetricStore
commit 33297e716a0a327fad20331813a582642c5e68e3
Author: zentol <[email protected]>
Date: 2016-10-07T08:16:49Z
[FLINK-4775] [metrics] Simplify MetricStore access
commit dfed8166272b361684594f61b401c38f0d68ebd6
Author: zentol <[email protected]>
Date: 2016-10-07T11:11:58Z
[FLINK-4774] [metrics] [hotfix] Fix scope concatenation in QueryScopeInfo
commit a7011e8305d7c828fabc4245358c2d21568fd561
Author: zentol <[email protected]>
Date: 2016-10-07T11:12:31Z
[FLINK-4733] [metrics] Port TaskManagersHandler
commit cbff6d6aab80bc423a09aa6b62c80a2f409d796a
Author: zentol <[email protected]>
Date: 2016-10-07T11:12:41Z
[FLINK-4733] [metrics] Remove old TaskManager metrics
commit cab25496ff5991de60e757f68c5d5139c86f34ba
Author: zentol <[email protected]>
Date: 2016-10-05T13:12:22Z
[FLINK-4733] [metrics] Remove system accumulators
commit 16983485198a61bec0418adb833508dcaf276170
Author: zentol <[email protected]>
Date: 2016-10-07T08:15:50Z
[FLINK-4733] [metrics] Reuse operator numRecordsIn/Out counter for task
commit 8be5145a9406dc8d6d661299c9ee98aa09233df4
Author: zentol <[email protected]>
Date: 2016-10-07T10:59:20Z
[FLINK-4733] [metrics] Port job/task handlers
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---