[jira] [Work logged] (BEAM-7605) Provide a way for user code to read dataflow runner stats

ASF GitHub Bot (JIRA) Thu, 27 Jun 2019 01:00:26 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-7605?focusedWorklogId=268348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268348
 ]


ASF GitHub Bot logged work on BEAM-7605:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jun/19 07:59
            Start Date: 27/Jun/19 07:59
    Worklog Time Spent: 10m 
      Work Description: echauchot commented on issue #8913: [BEAM-7605] Allow 
user-code to read counters from the dataflow worker
URL: https://github.com/apache/beam/pull/8913#issuecomment-506237890
 
 
   > On Tue, Jun 25, 2019 at 7:25 AM Steven Niemitz ***@***.***> wrote: Thanks 
for the reply! you're right, for Spark MetricsPusher thread will be 
instantiated on the Driver machine and Flink MetricsPusher thread will be 
instantiated on the JobManager machine. Indeed it pushes aggregated user 
metrics every each x seconds to a configured sink Looking at it, it doesn't 
seem like it'd be a large amount of work to have it run inside of the worker 
process, rather than on the submitter. That being said, if what you want are 
non-aggregated system metrics, MetricsPusher does not do that currently, it 
would need to be enhanced. Agreed. It doesn't seem like a large amount of work 
though. Even if it is aggregating, it'll only be aggregating a single worker 
anyways, so its essentially non-aggregated. But the problem is that Dataflow 
Beam runner is a client nutshell that just delegates the run of a serialized 
pipeline to the remote cloud hosted Dataflow engine (see BEAM-3926), so it 
would require to code on the Dataflow engine side a MetricsPusher-like service. 
The Dataflow worker code was opensourced and lives here:
   > 
https://github.com/apache/beam/tree/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker
   > […](#)
   > That is what this PR is actually adding (kind of). CounterUpdateReceiver 
is called into from the dataflow worker with the counters it has collected over 
that period. This contains all system and user metrics that were collected, 
albeit in a different format. There are (probably?) metrics that the windmill 
service collects (and pushes to the dataflow metrics collection service) that 
this doesn't capture, but its good enough for now, and anything is better than 
nothing here imo. Here's what I could see going forward: - "somehow" adapt the 
dataflow metrics container to a MetricsContainerStepMap. It seems like 
implementing a MetricsContainerStepMap that wraps the dataflow metrics ( 
pendingDeltaCounters and pendingCumulativeCounters) would be the way to go 
here. - figure out how to indicate "where" to push metrics from. It looks like 
dataflow doesn't support MetricsPusher at all, so there's no backwards 
compatibility problem where, but it might be confusing that some runners push 
aggregated metrics from the submitter, and some non-aggregated from the worker. 
Maybe another flag that indicates aggregated vs non-aggregated? Interested to 
hear your thoughts! — You are receiving this because you were mentioned. Reply 
to this email directly, view it on GitHub 
<#8913?email_source=notifications&email_token=ACM4V3DUJ4MBIEI2RNBWRILP4ITG5A5CNFSM4HZTTQW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQNR6Q#issuecomment-505469178>,
 or mute the thread 
<https://github.com/notifications/unsubscribe-auth/ACM4V3GN4ZYEZCFQ3XDI77DP4ITG5ANCNFSM4HZTTQWQ>
 .
   
   I thought it was on another part of Dataflow that was not opensourced
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 268348)
    Time Spent: 1h 50m  (was: 1h 40m)

> Provide a way for user code to read dataflow runner stats
> ---------------------------------------------------------
>
>                 Key: BEAM-7605
>                 URL: https://issues.apache.org/jira/browse/BEAM-7605
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Steve Niemitz
>            Assignee: Steve Niemitz
>            Priority: Major
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The dataflow runner collects (and publishes to the dataflow service) a large 
> number of useful stats.  While these can be polled from the dataflow service 
> via its API, there are a few downsides to this:
>  * it requires another process to poll and collect the stats
>  * the stats are aggregated across all workers, so per-worker stats are lost
> It would be simple to provide a hook to allow users to receive stats updates 
> as well, and then do whatever they want with them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7605) Provide a way for user code to read dataflow runner stats

Reply via email to