Hi Ismaël,

You've raised some great points.
Please see my comments inline.

On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía <[email protected]> wrote:

> ​Hello,
>
> The new metrics API allows us to integrate some basic metrics into the Beam
> IOs. I have been following some discussions about this on JIRAs/PRs, and I
> think it is important to discuss the subject here so we can have more
> awareness and obtain ideas from the community.
>
> First I want to thank Ben for his work on the metrics API, and Aviem for
> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware of
> this subject.
>
> There are some basic ideas to discuss e.g.
>
> - What are the responsibilities of Beam IOs in terms of Metrics
> (considering the fact that the actual IOs, server + client, usually provide
> their own)?
>

While it is true that many IOs provide their own metrics, I think that Beam
should expose IO metrics because:

   1. Metrics which help understanding performance of a pipeline which uses
   an IO may not be covered by the IO .
   2. Users may not be able to setup integrations with the IO's metrics to
   view them effectively (And correlate them to a specific Beam pipeline), but
   still want to investigate their pipeline's performance.


> - What metrics are relevant to the pipeline (or some particular IOs)? Kafka
> backlog for one could point that a pipeline is behind ingestion rate.


I think it depends on the IO, but there is probably overlap in some of the
metrics so a guideline might be written for this.
I listed what I thought should be reported for KafkaIO in the following
JIRA: https://issues.apache.org/jira/browse/BEAM-1398
Feel free to add more metrics you think are important to report.


>
>
- Should metrics be calculated on IOs by default or no?
> - If metrics are defined by default does it make sense to allow users to
> disable them?
>

IIUC, your concern is that metrics will add overhead to the pipeline, and
pipelines which are highly sensitive to this will be hampered?
In any case I think that yes, metrics calculation should be configurable
(Enable/disable).
In Spark runner, for example the Metrics sink feature (not the metrics
calculation itself, but sinks to send them to) is configurable in the
pipeline options.


> Well these are just some questions around the subject so we can create a
> common set of practices to include metrics in the IOs and eventually
> improve the transform guide with this. What do you think about this? Do you
> have other questions/ideas?
>
> Thanks,
> Ismaël
>

Reply via email to