I think this is a great discussion and I'd like to relate to some of the
points raised here, and raise some of my own.

First of all I think we should be careful here not to cross boundaries. IOs
naturally have many metrics, and Beam should avoid "taking over" those. IO
metrics should focus on what's relevant to the Pipeline: input/output rate,
backlog (for UnboundedSources, which exists in bytes but for monitoring
purposes we might want to consider #messages).

I don't agree that we should not invest in doing this in Sources/Sinks and
going directly to SplittableDoFn because the IO API is familiar and known,
and as long as we keep it should be treated as a first class citizen.

As for enable/disable - if IOs consider focusing on pipeline-related
metrics I think we should be fine, though this could also change between
runners as well.

Finally, considering "split-metrics" is interesting because on one hand it
affects the pipeline directly (unbalanced partitions in Kafka that may
cause backlog) but this is that fine-line of responsibilities (Kafka
monitoring would probably be able to tell you that partitions are not
balanced).

My 2 cents, cheers!

On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi <[email protected]>
wrote:

> On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers <[email protected]
> >
> wrote:
>
> >
> > > * I also think there are data source specific metrics that a given IO
> > will
> > > want to expose (ie, things like kafka backlog for a topic.)
>
>
> UnboundedSource has API for backlog. It is better for beam/runners to
> handle backlog as well.
> Of course there will be some source specific metrics too (errors, i/o ops
> etc).
>

Reply via email to