[
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478254#comment-16478254
]
Alex Amato commented on BEAM-3926:
----------------------------------
Hi Etienne, I saw your PR for the metrics pusher
([https://github.com/apache/beam/pull/4548/files])
Its true that the dataflow engine today handles pushing metrics to different
places inside of its service.
Although, it might be appropriate to have metrics pusher push metrics to the
dataflow service. It seems like an appropriate use of the layer there. However,
perhaps your design assumes metrics are already aggregated before pushing.
Dataflow expects workers to push metrics (local value for the worker) to the
service, which aggregates them together.
Metrics pusher relies on a metrics container to exist on a cloud hosted engine
to collected these already aggregated metrics? Then it pushes to where ever
appropriate correct? If this is the case, then you're right that metrics pusher
would need to be implemented in the Dataflow service, ideally accounting for
the options/sinks you have specified.
Though, perhaps a design is possible to send the pre aggregated metrics back to
a worker (by querying them from the service) and then use the same
MetricsPusher.
> Support MetricsPusher in Dataflow Runner
> ----------------------------------------
>
> Key: BEAM-3926
> URL: https://issues.apache.org/jira/browse/BEAM-3926
> Project: Beam
> Issue Type: Sub-task
> Components: runner-dataflow
> Reporter: Scott Wegner
> Assignee: Pablo Estrada
> Priority: Major
>
> See [relevant email
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
> From [~echauchot]:
>
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very
> different from the others. It just submits a job to the cloud hosted engine.
> So, no access to metrics container etc... from the runner. So I think that
> the MetricsPusher (component responsible for merging metrics and pushing them
> to a sink backend) must not be instanciated in DataflowRunner otherwise it
> would be more a client (driver) piece of code and we will lose all the
> interest of being close to the execution engine (among other things
> instrumentation of the execution of the pipelines). I think that the
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)