I like the generalization from entity -> labels. I view the purpose of those fields to provide context. And labels feel like they supports a richer set of contexts.
The URN concept gets a little tricky. I totally agree that the context fields should not be embedded in the name. There's a "name" which is the identifier that can be used to communicate what context values are supported / allowed for metrics with that name (for example, element_count expects a ptransform ID). But then there's the context. In Stackdriver, this context is a map of key-value pairs; the type is considered metadata associated with the name, but not communicated with the value. Could the URN be "beam:namespace:name" and every metric have a map of key-value pairs for context? Not sure where this fits in the discussion or if this is handled somewhere, but allowing for a metric configuration that's provided independently of the value allows for configuring "type", "units", etc in a uniform way without having to encode them in the metric name / value. Stackdriver expects each metric type has been configured ahead of time with these annotations / metadata. Then values are reported separately. For system metrics, the definitions can be packaged with the SDK. For user metrics, they'd be defined at runtime. On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles <k...@google.com> wrote: > > > On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw <rober...@google.com> > wrote: > >> On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles <k...@google.com> wrote: >> >>> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> Also, the only use for payloads is because "User Counter" is currently >>>> a single URN, rather than using the namespacing characteristics of URNs to >>>> map user names onto distinct metric names. >>>> >>> >>> Can they be URNs? I don't see value in having a "user metric" URN where >>> you then have to look elsewhere for what the real name is. >>> >> >> Yes, that was my point with the parenthetical statement. I would rather >> have "beam:counter:user:use_provide_namespace:user_provide_name" than use >> the payload field for this. So if we're going to keep the payload field, we >> need more compelling usecases. >> > > Or just "beam:counter:<namespace>:<name>" or even > "beam:metric:<namespace>:<name>" since metrics have a type separate from > their name. > > Kenn > > >> A payload avoids the messiness of having to pack (and parse) arbitrary >>> parameters into a name though.) If we're going to choose names that the >>> system and sdks agree to have specific meanings, and to avoid accidental >>> collisions, making them full-fledged documented URNs has value. >>> >> >>> Value is the "payload". Likely worth changing the name to avoid >>> confusion with the payload above. It's bytes because it depends on the >>> type. I would try to avoid nesting it too deeply (e.g. a payload within a >>> payload). If we thing the types are generally limited, another option would >>> be a oneof field (with a bytes option just in case) for transparency. There >>> are pros and cons going this route. >>> >>> Type is what I proposed we add, instead of it being implicit in the name >>> (and unknowable if one does not recognize the name). This makes things more >>> open-ended and easier to evolve and work with. >>> >>> Entity could be generalized to Label, or LabelSet if desired. But as >>> mentioned I think it makes sense to pull this out as a separate field, >>> especially when it makes sense to aggregate a single named counter across >>> labels as well as for a single label (e.g. execution time of composite >>> transforms). >>> >>> - Robert >>> >>> >>> >>> On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <foeg...@google.com> >>> wrote: >>> >>>> Hi folks - >>>> >>>> Before we totally go down the path of highly structured metric protos, >>>> I'd like to propose considering a simple metrics interface between the SDK >>>> and the runner. Something more generic and closer to what most monitoring >>>> systems would use. >>>> >>>> To use Spark as an example, the Metric system uses a simple metric >>>> format of name, value and type to report all metrics in a single structure, >>>> regardless of the source or context of the metric. >>>> >>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html >>>> >>>> The subsystems have contracts for what metrics they will expose and how >>>> they are calculated: >>>> >>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html >>>> >>>> Codifying the system metrics in the SDK seems perfectly reasonable - no >>>> reason to make the notion of metric generic at that level. But at the >>>> point the metric is leaving the SDK and going to the runner, a simpler, >>>> generic encoding of the metrics might make it easier to adapt and maintain >>>> system. The generic format can include information about downstream >>>> consumers, if that's useful. >>>> >>>> Spark supports a number of Metric Sinks - external monitoring systems. >>>> If runners receive a simple list of metrics, implementing any number of >>>> Sinks for Beam would be straightforward and would generally be a one time >>>> implementation. If instead all system metrics are sent embedded in a >>>> highly structured, semantically meaningful structure, runner code would >>>> need to be updated to support exporting the new metric. We seem to be >>>> heading in the direction of "if you don't understand this metric, you can't >>>> use it / export it". But most systems seem to assume metrics are really >>>> simple named values that can be handled a priori. >>>> >>>> So I guess my primary question is: Is it necessary for Beam to treat >>>> metrics as highly semantic, arbitrarily complex data? Or could they >>>> possibly be the sort of simple named values as they are in most monitoring >>>> systems and in Spark? With the SDK potentially providing scaffolding to >>>> add meaning and structure, but simplifying that out before leaving SDK >>>> code. Is the coupling to a semantically meaningful structure between the >>>> SDK and runner and necessary complexity? >>>> >>>> Andrea >>>> >>>> >>>> >>>> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com> >>>> wrote: >>>> >>>>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com> >>>>> wrote: >>>>> >>>>>> >>>>>> *Thank you for this clarification. I think the table of files fits >>>>>> into the model as one of type string-set (with union as aggregation). * >>>>>> Its not a list of files, its a list of metadata for each file, >>>>>> several pieces of data per file. >>>>>> >>>>>> Are you proposing that there would be separate URNs as well for each >>>>>> entity being measured then, so the the URN defines the type of entity >>>>>> being >>>>>> measured. >>>>>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for >>>>>> PCollection entities >>>>>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for >>>>>> PTransform entities >>>>>> >>>>> >>>>> Yes. FWIW, it may not even be needed to put this in the name, e.g. >>>>> execution times are never for PCollections, and even if they were it'd be >>>>> semantically a very different beast (which should not re-use the same >>>>> URN). >>>>> >>>>> *message MetricSpec {* >>>>>> * // (Required) A URN that describes the accompanying payload.* >>>>>> * // For any URN that is not recognized (by whomever is inspecting* >>>>>> * // it) the parameter payload should be treated as opaque and* >>>>>> * // passed as-is.* >>>>>> * string urn = 1;* >>>>>> >>>>>> * // (Optional) The data specifying any parameters to the URN. If* >>>>>> * // the URN does not require any arguments, this may be omitted.* >>>>>> * bytes parameters_payload = 2;* >>>>>> >>>>>> * // (Required) A URN that describes the type of values this metric* >>>>>> * // records (e.g. durations that should be summed).* >>>>>> *}* >>>>>> >>>>>> *message Metric[Values] {* >>>>>> * // (Required) The original requesting MetricSpec.* >>>>>> * MetricSpec metric_spec = 1;* >>>>>> >>>>>> * // A mapping of entities to (encoded) values.* >>>>>> * map<string, bytes> values;* >>>>>> This ignores the non-unqiueness of entity identifiers. This is why in >>>>>> my doc, I have specified the entity type and its string identifier >>>>>> @Ken, I believe you have pointed this out in the past, that >>>>>> uniqueness is only guaranteed within a type of entity (all PCollections), >>>>>> but not between entities (A Pcollection and PTransform may have the same >>>>>> identifier). >>>>>> >>>>> >>>>> See above for why this is not an issue. The extra complexity (in >>>>> protos and code), the inability to use them as map keys, and the fact that >>>>> they'll be 100% redundant for all entities for a given metric convinces me >>>>> that it's not worth creating and tracking an enum for the type alongside >>>>> the id. >>>>> >>>>> >>>>>> *}* >>>>>> >>>>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com> >>>>>> wrote: >>>>>> >>>>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> To Robert's proto: >>>>>>>> >>>>>>>> // A mapping of entities to (encoded) values. >>>>>>>>> map<string, bytes> values; >>>>>>>>> >>>>>>>> >>>>>>>> Are the keys here the names of the metrics, aka what is used for >>>>>>>> URNs in the doc? >>>>>>>> >>>>>>>>> >>>>>>> They're the entities to which a metric is attached, e.g. a >>>>>>> PTransform, a PCollection, or perhaps a process/worker. >>>>>>> >>>>>>> >>>>>>>> } >>>>>>>>> >>>>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Agree with all of this. It echoes a thread on the doc that I was >>>>>>>>>>> going to bring here. Let's keep it simple and use concrete use >>>>>>>>>>> cases to >>>>>>>>>>> drive additional abstraction if/when it becomes compelling. >>>>>>>>>>> >>>>>>>>>>> Kenn >>>>>>>>>>> >>>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers < >>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics >>>>>>>>>>>> of supported type" didn't include new ways of aggregating ints. As >>>>>>>>>>>> long as >>>>>>>>>>>> that means we have a fixed set of aggregations (that align with >>>>>>>>>>>> what what >>>>>>>>>>>> users want and metrics back end support) it seems like we are >>>>>>>>>>>> doing user >>>>>>>>>>>> metrics right. >>>>>>>>>>>> >>>>>>>>>>>> - Ben >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are >>>>>>>>>>>>> used a lot but others are less mainstream so being too fine from >>>>>>>>>>>>> the start >>>>>>>>>>>>> can just add complexity and bugs in impls IMHO. >>>>>>>>>>>>> >>>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com> >>>>>>>>>>>>> a écrit : >>>>>>>>>>>>> >>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>>>> type, as >>>>>>>>>>>>>> would double-distribution. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> When you say type do you mean accumulator type, result type, >>>>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of >>>>>>>>>>>>>>> sumint, >>>>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>>>>> (and may >>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>>>>> types based on the fact they just weren't used enough to >>>>>>>>>>>>>>>>> justify support. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus >>>>>>>>>>>>>>>>> any special >>>>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're >>>>>>>>>>>>>>>>>> conflating the idea of custom metrics and custom metric >>>>>>>>>>>>>>>>>> types. I would >>>>>>>>>>>>>>>>>> propose the MetricSpec field be augmented with an additional >>>>>>>>>>>>>>>>>> field "type" >>>>>>>>>>>>>>>>>> which is a urn specifying the type of metric it is (i.e. the >>>>>>>>>>>>>>>>>> contents of >>>>>>>>>>>>>>>>>> its payload, as well as the form of aggregation). Summing or >>>>>>>>>>>>>>>>>> maxing over >>>>>>>>>>>>>>>>>> ints would be a typical example. Though we could pursue >>>>>>>>>>>>>>>>>> making this opaque >>>>>>>>>>>>>>>>>> to the runner in the long run, that's a more speculative >>>>>>>>>>>>>>>>>> (and difficult) >>>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>>>>>>> specialization >>>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>>>>>>> well.) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for >>>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric and >>>>>>>>>>>>>>>>>> it spec would >>>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) >>>>>>>>>>>>>>>>>> name. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>>>>> I have made a revision today which is to make all >>>>>>>>>>>>>>>>>>> metrics refer to a primary entity, so I have restructured >>>>>>>>>>>>>>>>>>> some of the >>>>>>>>>>>>>>>>>>> protos a little bit. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The point of this change was to futureproof the >>>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom >>>>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>>>> functions for its metric updates. >>>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity >>>>>>>>>>>>>>>>>>> associated with it (e.g. PCollection, PTransform), we can >>>>>>>>>>>>>>>>>>> design an >>>>>>>>>>>>>>>>>>> approach which forwards the opaque bytes metric updates, >>>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>> deserializing them. These are forwarded to user provided >>>>>>>>>>>>>>>>>>> code which then >>>>>>>>>>>>>>>>>>> would deserialize the metric update payloads and perform >>>>>>>>>>>>>>>>>>> the custom >>>>>>>>>>>>>>>>>>> aggregations. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform >>>>>>>>>>>>>>>>>>> names inside >>>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make >>>>>>>>>>>>>>>>>>>> a decision by Friday, and begin working on related PRs >>>>>>>>>>>>>>>>>>>> next week. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before >>>>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this >>>>>>>>>>>>>>>>>>>> thread Friday >>>>>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on >>>>>>>>>>>>>>>>>>>>> this proposal so far. I >>>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. >>>>>>>>>>>>>>>>>>>>> There were some larger >>>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of >>>>>>>>>>>>>>>>>>>>> these I have added a >>>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please >>>>>>>>>>>>>>>>>>>>> take another look after >>>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>