+1 to making the IO metrics (e.g. producers, consumers) available as part of the Beam pipeline metrics tree for debugging and visibility.
As it has already been mentioned, many IO clients have a metrics mechanism in place, so in these cases I think it could be beneficial to mirror their metrics under the relevant subtree of the Beam metrics tree. On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <[email protected]> wrote: > I think this is a great discussion and I'd like to relate to some of the > points raised here, and raise some of my own. > > First of all I think we should be careful here not to cross boundaries. IOs > naturally have many metrics, and Beam should avoid "taking over" those. IO > metrics should focus on what's relevant to the Pipeline: input/output rate, > backlog (for UnboundedSources, which exists in bytes but for monitoring > purposes we might want to consider #messages). > > I don't agree that we should not invest in doing this in Sources/Sinks and > going directly to SplittableDoFn because the IO API is familiar and known, > and as long as we keep it should be treated as a first class citizen. > > As for enable/disable - if IOs consider focusing on pipeline-related > metrics I think we should be fine, though this could also change between > runners as well. > > Finally, considering "split-metrics" is interesting because on one hand it > affects the pipeline directly (unbalanced partitions in Kafka that may > cause backlog) but this is that fine-line of responsibilities (Kafka > monitoring would probably be able to tell you that partitions are not > balanced). > > My 2 cents, cheers! > > On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi <[email protected]> > wrote: > > > On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers > <[email protected] > > > > > wrote: > > > > > > > > > * I also think there are data source specific metrics that a given IO > > > will > > > > want to expose (ie, things like kafka backlog for a topic.) > > > > > > UnboundedSource has API for backlog. It is better for beam/runners to > > handle backlog as well. > > Of course there will be some source specific metrics too (errors, i/o ops > > etc). > > >
