Hi Scott,Thanks for the update. Both solutions look good to me. Though, they 
both have plus and minus. I let the
googlers chose which is more appropriate:
- DAG modifcation: less intrusive in Dataflow but the DAG executed and shown in 
the DAG UI in dataflow will contain an
extra step that the user might wonder about.- polling thread: it is exactly 
what I did for the other runners, it is more
transparent to the user but  requires more infra work (adds a container that 
needs to be resilient)
BestEtienne
Le vendredi 21 septembre 2018 à 12:46 -0700, Scott Wegner a écrit :
> Hi Etienne, sorry for the delay on this. I just got back from leave and found 
> this discussion.
> We haven't started implementing MetricsPusher in the Dataflow runner, mostly 
> because the Dataflow service has it's own
> rich Metrics REST API and we haven't heard a need from Dataflow customers to 
> push metrics to an external backend.
> However, it would be nice to have this implemented across all runners for 
> feature parity.
> 
> I read through the discussion in JIRA [1], and the simplest implementation 
> for Dataflow may be to have a single thread
> periodically poll the Dataflow REST API [2] for latest metric values, and 
> push them to a configured sink. This polling
> thread could be hosted in a separate docker container, within the worker 
> process, or perhaps a ParDo with timers that
> gets injected into the pipeline during graph translation.
> 
> At any rate, I'm not aware of anybody currently working on this. But with the 
> Dataflow worker code being donated to
> Beam [3], soon it will be possible for anybody to contribute.
> 
> [1] https://issues.apache.org/jira/browse/BEAM-3926
> [2] 
> https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.jobs/getMetrics
> [3] 
> https://lists.apache.org/thread.html/2bdc645659e2fbd7e29f3a2758941faefedb01148a2a11558dfe60f8@%3Cdev.beam.apache.o
> rg%3E
> 
> On Fri, Aug 17, 2018 at 4:26 PM Lukasz Cwik <[email protected]> wrote:
> > I forwarded your request to a few people who work on the internal parts of 
> > Dataflow to see if they could help in
> > some way.
> > On Thu, Aug 16, 2018 at 6:22 AM Etienne Chauchot <[email protected]> 
> > wrote:
> > > Hi all
> > > 
> > > As we already discussed, it would be good to support Metrics Pusher [1] 
> > > in Dataflow (in other runners also, of
> > > course). Today, only Spark and Flink support it. It requires a 
> > > modification in C++ Dataflow code, so only Google
> > > friends can do it. 
> > > 
> > > Is someone interested in doing it ? 
> > > 
> > > Here is the ticket https://issues.apache.org/jira/browse/BEAM-3926
> > > 
> > > Besides, I wonder if this feature should be added to the capability 
> > > matrix.
> > > 
> > > [1] 
> > > https://cwiki.apache.org/confluence/display/BEAM/Metrics+architecture+inside+the+runners
> > > 
> > > Thanks
> > > Etienne
> 
> 

Reply via email to