GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/2616

    [FLINK-4733] Port WebInterface to metric system

    # This PR relies on #2613, #2614 and #2615. Thus, the first 5 commits 
should not be reviewed here.
    
    This PR ports the remaining parts of the WebInterface to rely on the metric 
system.
    
    # TaskManager metrics
    
    In a7011e8305d7c828fabc4245358c2d21568fd561 the TaskManagersHandler is 
modified to use the metric system. In addition, the garbage collector section 
in the WebInterface was enhanced to no longer rely on hard-coded GC names, but 
instead be dynamic. The recently introduced network metrics have been added as 
well.
    
    cbff6d6aab80bc423a09aa6b62c80a2f409d796a then removes the remnants of the 
old metrics that are now unused. This affects the TaskManager(no longer gathers 
these metrics) and Heartbeat messages (no longer includes a metrics report). As 
a result the DropWizard dependency was removed. The transitive jackson 
dependency is now explicitly set for both flink-runtime and flink-runtime-web.
    
    # Task metrics
    
    The Webinterface shows how many records/bytes each task has received or 
sent. Until now these were gathered with system specific accumulators.
    
    cab25496ff5991de60e757f68c5d5139c86f34ba these accumulators were removed.
    
    Under the new system, bytes In/Out is measured per task (since it doesn't 
make sense within chained operators), while records In/Out is measured per 
operator. In order to display the records metrics for each task it was thus 
necessary to "reuse" some operator counters for the task.
    
    This is implemented in 16983485198a61bec0418adb833508dcaf276170 by 
re-registering the numRecordsIn counter of the first operator in the chain and 
the the numRecordsOut counter of the last operator on the task level 
    
    This re-use could (sadly) not be done automatically within the metric 
system. Instead 2 helper methods were added to the OperatorIOMetricGroup, which 
are called for example within BatchTask#invoke(), which forward the counters to 
the TaskIOMetricGroup where they are stored and re-registered.
    
    With these metrics being re-registered they can be accessed easily via the 
MetricQueryService from the WebInterface handlers. The downside is that this 
service provides no guarantee that the most up-to-date metrics for a finished 
task will be transferred. It was thus necessary to store a snapshot of these 
IOMetrics within the ExecutionGraph, similar to the system accumulators, which 
the handlers could access as well.
    
    The handlers were finally adjusted in 
8be5145a9406dc8d6d661299c9ee98aa09233df4. For running tasks they access metrics 
via the MetricQueryService, whereas for finished tasks they rely on the metrics 
stored in the ExecutionGraph.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 4733_metrics_port

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2616
    
----
commit 5f0f3598fa5d0fdf8b61d591e2bb94b74924ee0d
Author: zentol <[email protected]>
Date:   2016-10-07T11:02:10Z

    [FLINK-4773] [metrics] [refactor] Rename IOMetricGroup to TaskIOMetricGroup

commit df40a58c74e7f0fc3feec4a5848f1627bf4537dd
Author: zentol <[email protected]>
Date:   2016-10-05T13:04:03Z

    [FLINK-4773] [metrics] [refactor] Introduce OperatorIOMetricGroup

commit 2685f6a908a0ce4cc9fe3d97beca005ea3d59ee5
Author: zentol <[email protected]>
Date:   2016-10-07T08:11:31Z

    [FLINK-4772] [metrics] Store metrics as strings in MetricStore

commit 33297e716a0a327fad20331813a582642c5e68e3
Author: zentol <[email protected]>
Date:   2016-10-07T08:16:49Z

    [FLINK-4775] [metrics] Simplify MetricStore access

commit dfed8166272b361684594f61b401c38f0d68ebd6
Author: zentol <[email protected]>
Date:   2016-10-07T11:11:58Z

    [FLINK-4774] [metrics] [hotfix] Fix scope concatenation in QueryScopeInfo

commit a7011e8305d7c828fabc4245358c2d21568fd561
Author: zentol <[email protected]>
Date:   2016-10-07T11:12:31Z

    [FLINK-4733] [metrics] Port TaskManagersHandler

commit cbff6d6aab80bc423a09aa6b62c80a2f409d796a
Author: zentol <[email protected]>
Date:   2016-10-07T11:12:41Z

    [FLINK-4733] [metrics] Remove old TaskManager metrics

commit cab25496ff5991de60e757f68c5d5139c86f34ba
Author: zentol <[email protected]>
Date:   2016-10-05T13:12:22Z

    [FLINK-4733] [metrics] Remove system accumulators

commit 16983485198a61bec0418adb833508dcaf276170
Author: zentol <[email protected]>
Date:   2016-10-07T08:15:50Z

    [FLINK-4733] [metrics] Reuse operator numRecordsIn/Out counter for task

commit 8be5145a9406dc8d6d661299c9ee98aa09233df4
Author: zentol <[email protected]>
Date:   2016-10-07T10:59:20Z

    [FLINK-4733] [metrics] Port job/task handlers

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to