+1 to keeping things simple, both in code and the model to understand. I like thinking of things as (name, value, type) triples. Historically, we've packed the entity name (e.g. PTransform name) into the string name field and parsed it out in various places; I think it's worth pulling this out and making it explicit instead, so metrics would be (name, entity, value, type) tuples. In the current proposal:
Name is the URN + a possible bytes payload. (Actually, it's a bit unclear if there's any relationship between counters with the same name and different payloads. Also, the only use for payloads is because "User Counter" is currently a single URN, rather than using the namespacing characteristics of URNs to map user names onto distinct metric names. A payload avoids the messiness of having to pack (and parse) arbitrary parameters into a name though.) If we're going to choose names that the system and sdks agree to have specific meanings, and to avoid accidental collisions, making them full-fledged documented URNs has value. Value is the "payload". Likely worth changing the name to avoid confusion with the payload above. It's bytes because it depends on the type. I would try to avoid nesting it too deeply (e.g. a payload within a payload). If we thing the types are generally limited, another option would be a oneof field (with a bytes option just in case) for transparency. There are pros and cons going this route. Type is what I proposed we add, instead of it being implicit in the name (and unknowable if one does not recognize the name). This makes things more open-ended and easier to evolve and work with. Entity could be generalized to Label, or LabelSet if desired. But as mentioned I think it makes sense to pull this out as a separate field, especially when it makes sense to aggregate a single named counter across labels as well as for a single label (e.g. execution time of composite transforms). - Robert On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <foeg...@google.com> wrote: > Hi folks - > > Before we totally go down the path of highly structured metric protos, I'd > like to propose considering a simple metrics interface between the SDK and > the runner. Something more generic and closer to what most monitoring > systems would use. > > To use Spark as an example, the Metric system uses a simple metric format > of name, value and type to report all metrics in a single structure, > regardless of the source or context of the metric. > > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html > > The subsystems have contracts for what metrics they will expose and how > they are calculated: > > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html > > Codifying the system metrics in the SDK seems perfectly reasonable - no > reason to make the notion of metric generic at that level. But at the > point the metric is leaving the SDK and going to the runner, a simpler, > generic encoding of the metrics might make it easier to adapt and maintain > system. The generic format can include information about downstream > consumers, if that's useful. > > Spark supports a number of Metric Sinks - external monitoring systems. If > runners receive a simple list of metrics, implementing any number of Sinks > for Beam would be straightforward and would generally be a one time > implementation. If instead all system metrics are sent embedded in a > highly structured, semantically meaningful structure, runner code would > need to be updated to support exporting the new metric. We seem to be > heading in the direction of "if you don't understand this metric, you can't > use it / export it". But most systems seem to assume metrics are really > simple named values that can be handled a priori. > > So I guess my primary question is: Is it necessary for Beam to treat > metrics as highly semantic, arbitrarily complex data? Or could they > possibly be the sort of simple named values as they are in most monitoring > systems and in Spark? With the SDK potentially providing scaffolding to > add meaning and structure, but simplifying that out before leaving SDK > code. Is the coupling to a semantically meaningful structure between the > SDK and runner and necessary complexity? > > Andrea > > > > On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com> > wrote: > >> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com> wrote: >> >>> >>> *Thank you for this clarification. I think the table of files fits into >>> the model as one of type string-set (with union as aggregation). * >>> Its not a list of files, its a list of metadata for each file, several >>> pieces of data per file. >>> >>> Are you proposing that there would be separate URNs as well for each >>> entity being measured then, so the the URN defines the type of entity being >>> measured. >>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for >>> PCollection entities >>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for >>> PTransform entities >>> >> >> Yes. FWIW, it may not even be needed to put this in the name, e.g. >> execution times are never for PCollections, and even if they were it'd be >> semantically a very different beast (which should not re-use the same URN). >> >> *message MetricSpec {* >>> * // (Required) A URN that describes the accompanying payload.* >>> * // For any URN that is not recognized (by whomever is inspecting* >>> * // it) the parameter payload should be treated as opaque and* >>> * // passed as-is.* >>> * string urn = 1;* >>> >>> * // (Optional) The data specifying any parameters to the URN. If* >>> * // the URN does not require any arguments, this may be omitted.* >>> * bytes parameters_payload = 2;* >>> >>> * // (Required) A URN that describes the type of values this metric* >>> * // records (e.g. durations that should be summed).* >>> *}* >>> >>> *message Metric[Values] {* >>> * // (Required) The original requesting MetricSpec.* >>> * MetricSpec metric_spec = 1;* >>> >>> * // A mapping of entities to (encoded) values.* >>> * map<string, bytes> values;* >>> This ignores the non-unqiueness of entity identifiers. This is why in my >>> doc, I have specified the entity type and its string identifier >>> @Ken, I believe you have pointed this out in the past, that uniqueness >>> is only guaranteed within a type of entity (all PCollections), but not >>> between entities (A Pcollection and PTransform may have the same >>> identifier). >>> >> >> See above for why this is not an issue. The extra complexity (in protos >> and code), the inability to use them as map keys, and the fact that they'll >> be 100% redundant for all entities for a given metric convinces me that >> it's not worth creating and tracking an enum for the type alongside the id. >> >> >>> *}* >>> >>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com> wrote: >>>> >>>>> >>>>> To Robert's proto: >>>>> >>>>> // A mapping of entities to (encoded) values. >>>>>> map<string, bytes> values; >>>>>> >>>>> >>>>> Are the keys here the names of the metrics, aka what is used for URNs >>>>> in the doc? >>>>> >>>>>> >>>> They're the entities to which a metric is attached, e.g. a PTransform, >>>> a PCollection, or perhaps a process/worker. >>>> >>>> >>>>> } >>>>>> >>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Agree with all of this. It echoes a thread on the doc that I was >>>>>>>> going to bring here. Let's keep it simple and use concrete use cases to >>>>>>>> drive additional abstraction if/when it becomes compelling. >>>>>>>> >>>>>>>> Kenn >>>>>>>> >>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <bjchamb...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of >>>>>>>>> supported type" didn't include new ways of aggregating ints. As long >>>>>>>>> as >>>>>>>>> that means we have a fixed set of aggregations (that align with what >>>>>>>>> what >>>>>>>>> users want and metrics back end support) it seems like we are doing >>>>>>>>> user >>>>>>>>> metrics right. >>>>>>>>> >>>>>>>>> - Ben >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are >>>>>>>>>> used a lot but others are less mainstream so being too fine from the >>>>>>>>>> start >>>>>>>>>> can just add complexity and bugs in impls IMHO. >>>>>>>>>> >>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com> a >>>>>>>>>> écrit : >>>>>>>>>> >>>>>>>>>>> By "type" of metric, I mean both the data types (including their >>>>>>>>>>> encoding) and accumulator strategy. So sumint would be a type, as >>>>>>>>>>> would >>>>>>>>>>> double-distribution. >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> When you say type do you mean accumulator type, result type, or >>>>>>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, >>>>>>>>>>>> sumlong, >>>>>>>>>>>> meanlong, etc? >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>> (and may >>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>> metrics of >>>>>>>>>>>>> standard type. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>> types based on the fact they just weren't used enough to justify >>>>>>>>>>>>>> support. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any >>>>>>>>>>>>>> special >>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating >>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would >>>>>>>>>>>>>>> propose >>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field >>>>>>>>>>>>>>> "type" which is >>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of >>>>>>>>>>>>>>> its >>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing >>>>>>>>>>>>>>> over ints >>>>>>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>>>>>> opaque to >>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>>>>>> difficult) >>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>>>> specialization >>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>>>> well.) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every >>>>>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec >>>>>>>>>>>>>>> would >>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics >>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the >>>>>>>>>>>>>>>> protos a >>>>>>>>>>>>>>>> little bit. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The point of this change was to futureproof the possibility >>>>>>>>>>>>>>>> of allowing custom user metrics, with custom aggregation >>>>>>>>>>>>>>>> functions for its >>>>>>>>>>>>>>>> metric updates. >>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated >>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an >>>>>>>>>>>>>>>> approach which >>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without >>>>>>>>>>>>>>>> deserializing them. These >>>>>>>>>>>>>>>> are forwarded to user provided code which then would >>>>>>>>>>>>>>>> deserialize the metric >>>>>>>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform names >>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next >>>>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before >>>>>>>>>>>>>>>>> then and I will post the final decisions made to this thread >>>>>>>>>>>>>>>>> Friday >>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There >>>>>>>>>>>>>>>>>> were some larger >>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of these >>>>>>>>>>>>>>>>>> I have added a >>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please >>>>>>>>>>>>>>>>>> take another look after >>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>