On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw <rober...@google.com> wrote:
> Also, the only use for payloads is because "User Counter" is currently a > single URN, rather than using the namespacing characteristics of URNs to > map user names onto distinct metric names. > Can they be URNs? I don't see value in having a "user metric" URN where you then have to look elsewhere for what the real name is. Kenn A payload avoids the messiness of having to pack (and parse) arbitrary > parameters into a name though.) If we're going to choose names that the > system and sdks agree to have specific meanings, and to avoid accidental > collisions, making them full-fledged documented URNs has value. > > Value is the "payload". Likely worth changing the name to avoid confusion > with the payload above. It's bytes because it depends on the type. I would > try to avoid nesting it too deeply (e.g. a payload within a payload). If we > thing the types are generally limited, another option would be a oneof > field (with a bytes option just in case) for transparency. There are pros > and cons going this route. > > Type is what I proposed we add, instead of it being implicit in the name > (and unknowable if one does not recognize the name). This makes things more > open-ended and easier to evolve and work with. > > Entity could be generalized to Label, or LabelSet if desired. But as > mentioned I think it makes sense to pull this out as a separate field, > especially when it makes sense to aggregate a single named counter across > labels as well as for a single label (e.g. execution time of composite > transforms). > > - Robert > > > > On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <foeg...@google.com> > wrote: > >> Hi folks - >> >> Before we totally go down the path of highly structured metric protos, >> I'd like to propose considering a simple metrics interface between the SDK >> and the runner. Something more generic and closer to what most monitoring >> systems would use. >> >> To use Spark as an example, the Metric system uses a simple metric format >> of name, value and type to report all metrics in a single structure, >> regardless of the source or context of the metric. >> >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html >> >> The subsystems have contracts for what metrics they will expose and how >> they are calculated: >> >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html >> >> Codifying the system metrics in the SDK seems perfectly reasonable - no >> reason to make the notion of metric generic at that level. But at the >> point the metric is leaving the SDK and going to the runner, a simpler, >> generic encoding of the metrics might make it easier to adapt and maintain >> system. The generic format can include information about downstream >> consumers, if that's useful. >> >> Spark supports a number of Metric Sinks - external monitoring systems. >> If runners receive a simple list of metrics, implementing any number of >> Sinks for Beam would be straightforward and would generally be a one time >> implementation. If instead all system metrics are sent embedded in a >> highly structured, semantically meaningful structure, runner code would >> need to be updated to support exporting the new metric. We seem to be >> heading in the direction of "if you don't understand this metric, you can't >> use it / export it". But most systems seem to assume metrics are really >> simple named values that can be handled a priori. >> >> So I guess my primary question is: Is it necessary for Beam to treat >> metrics as highly semantic, arbitrarily complex data? Or could they >> possibly be the sort of simple named values as they are in most monitoring >> systems and in Spark? With the SDK potentially providing scaffolding to >> add meaning and structure, but simplifying that out before leaving SDK >> code. Is the coupling to a semantically meaningful structure between the >> SDK and runner and necessary complexity? >> >> Andrea >> >> >> >> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com> wrote: >>> >>>> >>>> *Thank you for this clarification. I think the table of files fits into >>>> the model as one of type string-set (with union as aggregation). * >>>> Its not a list of files, its a list of metadata for each file, several >>>> pieces of data per file. >>>> >>>> Are you proposing that there would be separate URNs as well for each >>>> entity being measured then, so the the URN defines the type of entity being >>>> measured. >>>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for >>>> PCollection entities >>>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for >>>> PTransform entities >>>> >>> >>> Yes. FWIW, it may not even be needed to put this in the name, e.g. >>> execution times are never for PCollections, and even if they were it'd be >>> semantically a very different beast (which should not re-use the same URN). >>> >>> *message MetricSpec {* >>>> * // (Required) A URN that describes the accompanying payload.* >>>> * // For any URN that is not recognized (by whomever is inspecting* >>>> * // it) the parameter payload should be treated as opaque and* >>>> * // passed as-is.* >>>> * string urn = 1;* >>>> >>>> * // (Optional) The data specifying any parameters to the URN. If* >>>> * // the URN does not require any arguments, this may be omitted.* >>>> * bytes parameters_payload = 2;* >>>> >>>> * // (Required) A URN that describes the type of values this metric* >>>> * // records (e.g. durations that should be summed).* >>>> *}* >>>> >>>> *message Metric[Values] {* >>>> * // (Required) The original requesting MetricSpec.* >>>> * MetricSpec metric_spec = 1;* >>>> >>>> * // A mapping of entities to (encoded) values.* >>>> * map<string, bytes> values;* >>>> This ignores the non-unqiueness of entity identifiers. This is why in >>>> my doc, I have specified the entity type and its string identifier >>>> @Ken, I believe you have pointed this out in the past, that uniqueness >>>> is only guaranteed within a type of entity (all PCollections), but not >>>> between entities (A Pcollection and PTransform may have the same >>>> identifier). >>>> >>> >>> See above for why this is not an issue. The extra complexity (in protos >>> and code), the inability to use them as map keys, and the fact that they'll >>> be 100% redundant for all entities for a given metric convinces me that >>> it's not worth creating and tracking an enum for the type alongside the id. >>> >>> >>>> *}* >>>> >>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com> >>>> wrote: >>>> >>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com> >>>>> wrote: >>>>> >>>>>> >>>>>> To Robert's proto: >>>>>> >>>>>> // A mapping of entities to (encoded) values. >>>>>>> map<string, bytes> values; >>>>>>> >>>>>> >>>>>> Are the keys here the names of the metrics, aka what is used for URNs >>>>>> in the doc? >>>>>> >>>>>>> >>>>> They're the entities to which a metric is attached, e.g. a PTransform, >>>>> a PCollection, or perhaps a process/worker. >>>>> >>>>> >>>>>> } >>>>>>> >>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Agree with all of this. It echoes a thread on the doc that I was >>>>>>>>> going to bring here. Let's keep it simple and use concrete use cases >>>>>>>>> to >>>>>>>>> drive additional abstraction if/when it becomes compelling. >>>>>>>>> >>>>>>>>> Kenn >>>>>>>>> >>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <bjchamb...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of >>>>>>>>>> supported type" didn't include new ways of aggregating ints. As long >>>>>>>>>> as >>>>>>>>>> that means we have a fixed set of aggregations (that align with what >>>>>>>>>> what >>>>>>>>>> users want and metrics back end support) it seems like we are doing >>>>>>>>>> user >>>>>>>>>> metrics right. >>>>>>>>>> >>>>>>>>>> - Ben >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are >>>>>>>>>>> used a lot but others are less mainstream so being too fine from >>>>>>>>>>> the start >>>>>>>>>>> can just add complexity and bugs in impls IMHO. >>>>>>>>>>> >>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com> >>>>>>>>>>> a écrit : >>>>>>>>>>> >>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>> type, as >>>>>>>>>>>> would double-distribution. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> When you say type do you mean accumulator type, result type, >>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of >>>>>>>>>>>>> sumint, >>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>>> (and may >>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>>> types based on the fact they just weren't used enough to >>>>>>>>>>>>>>> justify support. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus >>>>>>>>>>>>>>> any special >>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating >>>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would >>>>>>>>>>>>>>>> propose >>>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field >>>>>>>>>>>>>>>> "type" which is >>>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents >>>>>>>>>>>>>>>> of its >>>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or >>>>>>>>>>>>>>>> maxing over ints >>>>>>>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>>>>>>> opaque to >>>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>>>>>>> difficult) >>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>>>>> specialization >>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>>>>> well.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every >>>>>>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec >>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics >>>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the >>>>>>>>>>>>>>>>> protos a >>>>>>>>>>>>>>>>> little bit. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The point of this change was to futureproof the >>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom >>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>> functions for its metric updates. >>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated >>>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an >>>>>>>>>>>>>>>>> approach which >>>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without >>>>>>>>>>>>>>>>> deserializing them. These >>>>>>>>>>>>>>>>> are forwarded to user provided code which then would >>>>>>>>>>>>>>>>> deserialize the metric >>>>>>>>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform names >>>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next >>>>>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before >>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this thread >>>>>>>>>>>>>>>>>> Friday >>>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There >>>>>>>>>>>>>>>>>>> were some larger >>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of >>>>>>>>>>>>>>>>>>> these I have added a >>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please >>>>>>>>>>>>>>>>>>> take another look after >>>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>