[ 
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478254#comment-16478254
 ] 

Alex Amato commented on BEAM-3926:
----------------------------------

Hi Etienne, I saw your PR for the metrics pusher 
([https://github.com/apache/beam/pull/4548/files])

Its true that the dataflow engine today handles pushing metrics to different 
places inside of its service.

Although, it might be appropriate to have metrics pusher push metrics to the 
dataflow service. It seems like an appropriate use of the layer there. However, 
perhaps your design assumes metrics are already aggregated before pushing. 
Dataflow expects workers to push metrics (local value for the worker) to the 
service, which aggregates them together.

Metrics pusher relies on a metrics container to exist on a cloud hosted engine 
to collected these already aggregated metrics? Then it pushes to where ever 
appropriate correct? If this is the case, then you're right that metrics pusher 
would need to be implemented in the Dataflow service, ideally accounting for 
the options/sinks you have specified.

Though, perhaps a design is possible to send the pre aggregated metrics back to 
a worker (by querying them from the service) and then use the same 
MetricsPusher.

> Support MetricsPusher in Dataflow Runner
> ----------------------------------------
>
>                 Key: BEAM-3926
>                 URL: https://issues.apache.org/jira/browse/BEAM-3926
>             Project: Beam
>          Issue Type: Sub-task
>          Components: runner-dataflow
>            Reporter: Scott Wegner
>            Assignee: Pablo Estrada
>            Priority: Major
>
> See [relevant email 
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
>  From [~echauchot]:
>   
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very 
> different from the others. It just submits a job to the cloud hosted engine. 
> So, no access to metrics container etc... from the runner. So I think that 
> the MetricsPusher (component responsible for merging metrics and pushing them 
> to a sink backend) must not be instanciated in DataflowRunner otherwise it 
> would be more a client (driver) piece of code and we will lose all the 
> interest of being close to the execution engine (among other things 
> instrumentation of the execution of the pipelines).  I think that the 
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to