Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Andrea Foegler Fri, 13 Apr 2018 14:03:05 -0700

I like the generalization from entity -> labels.  I view the purpose of
those fields to provide context.  And labels feel like they supports a
richer set of contexts.


The URN concept gets a little tricky.  I totally agree that the context
fields should not be embedded in the name.
There's a "name" which is the identifier that can be used to communicate
what context values are supported / allowed for metrics with that name (for
example, element_count expects a ptransform ID).  But then there's the
context.  In Stackdriver, this context is a map of key-value pairs; the
type is considered metadata associated with the name, but not communicated
with the value.  Could the URN be "beam:namespace:name" and every metric
have a map of key-value pairs for context?

Not sure where this fits in the discussion or if this is handled somewhere,
but allowing for a metric configuration that's provided independently of
the value allows for configuring "type", "units", etc in a uniform way
without having to encode them in the metric name / value.  Stackdriver
expects each metric type has been configured ahead of time with these
annotations / metadata.  Then values are reported separately.  For system
metrics, the definitions can be packaged with the SDK.  For user metrics,
they'd be defined at runtime.



On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles <k...@google.com> wrote:

>
>
> On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles <k...@google.com> wrote:
>>
>>> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> Also, the only use for payloads is because "User Counter" is currently
>>>> a single URN, rather than using the namespacing characteristics of URNs to
>>>> map user names onto distinct metric names.
>>>>
>>>
>>> Can they be URNs? I don't see value in having a "user metric" URN where
>>> you then have to look elsewhere for what the real name is.
>>>
>>
>> Yes, that was my point with the parenthetical statement. I would rather
>> have "beam:counter:user:use_provide_namespace:user_provide_name" than use
>> the payload field for this. So if we're going to keep the payload field, we
>> need more compelling usecases.
>>
>
> Or just "beam:counter:<namespace>:<name>" or even
> "beam:metric:<namespace>:<name>" since metrics have a type separate from
> their name.
>
> Kenn
>
>
>> A payload avoids the messiness of having to pack (and parse) arbitrary
>>> parameters into a name though.) If we're going to choose names that the
>>> system and sdks agree to have specific meanings, and to avoid accidental
>>> collisions, making them full-fledged documented URNs has value.
>>>
>>
>>> Value is the "payload". Likely worth changing the name to avoid
>>> confusion with the payload above. It's bytes because it depends on the
>>> type. I would try to avoid nesting it too deeply (e.g. a payload within a
>>> payload). If we thing the types are generally limited, another option would
>>> be a oneof field (with a bytes option just in case) for transparency. There
>>> are pros and cons going this route.
>>>
>>> Type is what I proposed we add, instead of it being implicit in the name
>>> (and unknowable if one does not recognize the name). This makes things more
>>> open-ended and easier to evolve and work with.
>>>
>>> Entity could be generalized to Label, or LabelSet if desired. But as
>>> mentioned I think it makes sense to pull this out as a separate field,
>>> especially when it makes sense to aggregate a single named counter across
>>> labels as well as for a single label (e.g. execution time of composite
>>> transforms).
>>>
>>> - Robert
>>>
>>>
>>>
>>> On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <foeg...@google.com>
>>> wrote:
>>>
>>>> Hi folks -
>>>>
>>>> Before we totally go down the path of highly structured metric protos,
>>>> I'd like to propose considering a simple metrics interface between the SDK
>>>> and the runner.  Something more generic and closer to what most monitoring
>>>> systems would use.
>>>>
>>>> To use Spark as an example, the Metric system uses a simple metric
>>>> format of name, value and type to report all metrics in a single structure,
>>>> regardless of the source or context of the metric.
>>>>
>>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html
>>>>
>>>> The subsystems have contracts for what metrics they will expose and how
>>>> they are calculated:
>>>>
>>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html
>>>>
>>>> Codifying the system metrics in the SDK seems perfectly reasonable - no
>>>> reason to make the notion of metric generic at that level.  But at the
>>>> point the metric is leaving the SDK and going to the runner, a simpler,
>>>> generic encoding of the metrics might make it easier to adapt and maintain
>>>> system.  The generic format can include information about downstream
>>>> consumers, if that's useful.
>>>>
>>>> Spark supports a number of Metric Sinks - external monitoring systems.
>>>> If runners receive a simple list of metrics, implementing any number of
>>>> Sinks for Beam would be straightforward and would generally be a one time
>>>> implementation.  If instead all system metrics are sent embedded in a
>>>> highly structured, semantically meaningful structure, runner code would
>>>> need to be updated to support exporting the new metric. We seem to be
>>>> heading in the direction of "if you don't understand this metric, you can't
>>>> use it / export it".  But most systems seem to assume metrics are really
>>>> simple named values that can be handled a priori.
>>>>
>>>> So I guess my primary question is:  Is it necessary for Beam to treat
>>>> metrics as highly semantic, arbitrarily complex data?  Or could they
>>>> possibly be the sort of simple named values as they are in most monitoring
>>>> systems and in Spark?  With the SDK potentially providing scaffolding to
>>>> add meaning and structure, but simplifying that out before leaving SDK
>>>> code.  Is the coupling to a semantically meaningful structure between the
>>>> SDK and runner and necessary complexity?
>>>>
>>>> Andrea
>>>>
>>>>
>>>>
>>>> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>>
>>>>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> *Thank you for this clarification. I think the table of files fits
>>>>>> into the model as one of type string-set (with union as aggregation). *
>>>>>> Its not a list of files, its a list of metadata for each file,
>>>>>> several pieces of data per file.
>>>>>>
>>>>>> Are you proposing that there would be separate URNs as well for each
>>>>>> entity being measured then, so the the URN defines the type of entity 
>>>>>> being
>>>>>> measured.
>>>>>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for
>>>>>> PCollection entities
>>>>>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for
>>>>>> PTransform entities
>>>>>>
>>>>>
>>>>> Yes. FWIW, it may not even be needed to put this in the name, e.g.
>>>>> execution times are never for PCollections, and even if they were it'd be
>>>>> semantically a very different beast (which should not re-use the same 
>>>>> URN).
>>>>>
>>>>> *message MetricSpec {*
>>>>>> *  // (Required) A URN that describes the accompanying payload.*
>>>>>> *  // For any URN that is not recognized (by whomever is inspecting*
>>>>>> *  // it) the parameter payload should be treated as opaque and*
>>>>>> *  // passed as-is.*
>>>>>> *  string urn = 1;*
>>>>>>
>>>>>> *  // (Optional) The data specifying any parameters to the URN. If*
>>>>>> *  // the URN does not require any arguments, this may be omitted.*
>>>>>> *  bytes parameters_payload = 2;*
>>>>>>
>>>>>> *  // (Required) A URN that describes the type of values this metric*
>>>>>> *  // records (e.g. durations that should be summed).*
>>>>>> *}*
>>>>>>
>>>>>> *message Metric[Values] {*
>>>>>> * // (Required) The original requesting MetricSpec.*
>>>>>> * MetricSpec metric_spec = 1;*
>>>>>>
>>>>>> * // A mapping of entities to (encoded) values.*
>>>>>> * map<string, bytes> values;*
>>>>>> This ignores the non-unqiueness of entity identifiers. This is why in
>>>>>> my doc, I have specified the entity type and its string identifier
>>>>>> @Ken, I believe you have pointed this out in the past, that
>>>>>> uniqueness is only guaranteed within a type of entity (all PCollections),
>>>>>> but not between entities (A Pcollection and PTransform may have the same
>>>>>> identifier).
>>>>>>
>>>>>
>>>>> See above for why this is not an issue. The extra complexity (in
>>>>> protos and code), the inability to use them as map keys, and the fact that
>>>>> they'll be 100% redundant for all entities for a given metric convinces me
>>>>> that it's not worth creating and tracking an enum for the type alongside
>>>>> the id.
>>>>>
>>>>>
>>>>>> *}*
>>>>>>
>>>>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> To Robert's proto:
>>>>>>>>
>>>>>>>>  // A mapping of entities to (encoded) values.
>>>>>>>>>  map<string, bytes> values;
>>>>>>>>>
>>>>>>>>
>>>>>>>> Are the keys here the names of the metrics, aka what is used for
>>>>>>>> URNs in the doc?
>>>>>>>>
>>>>>>>>>
>>>>>>> They're the entities to which a metric is attached, e.g. a
>>>>>>> PTransform, a PCollection, or perhaps a process/worker.
>>>>>>>
>>>>>>>
>>>>>>>> }
>>>>>>>>>
>>>>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Agree with all of this. It echoes a thread on the doc that I was
>>>>>>>>>>> going to bring here. Let's keep it simple and use concrete use 
>>>>>>>>>>> cases to
>>>>>>>>>>> drive additional abstraction if/when it becomes compelling.
>>>>>>>>>>>
>>>>>>>>>>> Kenn
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <
>>>>>>>>>>> bjchamb...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics
>>>>>>>>>>>> of supported type" didn't include new ways of aggregating ints. As 
>>>>>>>>>>>> long as
>>>>>>>>>>>> that means we have a fixed set of aggregations (that align with 
>>>>>>>>>>>> what what
>>>>>>>>>>>> users want and metrics back end support) it seems like we are 
>>>>>>>>>>>> doing user
>>>>>>>>>>>> metrics right.
>>>>>>>>>>>>
>>>>>>>>>>>> - Ben
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>>>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are
>>>>>>>>>>>>> used a lot but others are less mainstream so being too fine from 
>>>>>>>>>>>>> the start
>>>>>>>>>>>>> can just add complexity and bugs in impls IMHO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com>
>>>>>>>>>>>>> a écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including
>>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a 
>>>>>>>>>>>>>> type, as
>>>>>>>>>>>>>> would double-distribution.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <
>>>>>>>>>>>>>> bjchamb...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When you say type do you mean accumulator type, result type,
>>>>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of 
>>>>>>>>>>>>>>> sumint,
>>>>>>>>>>>>>>> sumlong, meanlong, etc?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <
>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and
>>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road 
>>>>>>>>>>>>>>>> (and may
>>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom 
>>>>>>>>>>>>>>>> metrics of
>>>>>>>>>>>>>>>> standard type.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <
>>>>>>>>>>>>>>>> bchamb...@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric
>>>>>>>>>>>>>>>>> types based on the fact they just weren't used enough to 
>>>>>>>>>>>>>>>>> justify support.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back?
>>>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus 
>>>>>>>>>>>>>>>>> any special
>>>>>>>>>>>>>>>>> system metrivs?
>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <
>>>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're
>>>>>>>>>>>>>>>>>> conflating the idea of custom metrics and custom metric 
>>>>>>>>>>>>>>>>>> types. I would
>>>>>>>>>>>>>>>>>> propose the MetricSpec field be augmented with an additional 
>>>>>>>>>>>>>>>>>> field "type"
>>>>>>>>>>>>>>>>>> which is a urn specifying the type of metric it is (i.e. the 
>>>>>>>>>>>>>>>>>> contents of
>>>>>>>>>>>>>>>>>> its payload, as well as the form of aggregation). Summing or 
>>>>>>>>>>>>>>>>>> maxing over
>>>>>>>>>>>>>>>>>> ints would be a typical example. Though we could pursue 
>>>>>>>>>>>>>>>>>> making this opaque
>>>>>>>>>>>>>>>>>> to the runner in the long run, that's a more speculative 
>>>>>>>>>>>>>>>>>> (and difficult)
>>>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least 
>>>>>>>>>>>>>>>>>> aggregate and
>>>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself 
>>>>>>>>>>>>>>>>>> understand the
>>>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>>>>>>>>>> specialization
>>>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as 
>>>>>>>>>>>>>>>>>> well.)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for
>>>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric and 
>>>>>>>>>>>>>>>>>> it spec would
>>>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) 
>>>>>>>>>>>>>>>>>> name.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - Robert
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <
>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>>>>>>>>>> I have made a revision today which is to make all
>>>>>>>>>>>>>>>>>>> metrics refer to a primary entity, so I have restructured 
>>>>>>>>>>>>>>>>>>> some of the
>>>>>>>>>>>>>>>>>>> protos a little bit.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The point of this change was to futureproof the
>>>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom 
>>>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>>>> functions for its metric updates.
>>>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity
>>>>>>>>>>>>>>>>>>> associated with it (e.g. PCollection, PTransform), we can 
>>>>>>>>>>>>>>>>>>> design an
>>>>>>>>>>>>>>>>>>> approach which forwards the opaque bytes metric updates, 
>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>> deserializing them. These are forwarded to user provided 
>>>>>>>>>>>>>>>>>>> code which then
>>>>>>>>>>>>>>>>>>> would deserialize the metric update payloads and perform 
>>>>>>>>>>>>>>>>>>> the custom
>>>>>>>>>>>>>>>>>>> aggregations.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric
>>>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform 
>>>>>>>>>>>>>>>>>>> names inside
>>>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the 
>>>>>>>>>>>>>>>>>>> metrics as the
>>>>>>>>>>>>>>>>>>> entities are pulled outside of the metric.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <
>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make
>>>>>>>>>>>>>>>>>>>> a decision by Friday, and begin working on related PRs 
>>>>>>>>>>>>>>>>>>>> next week.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before
>>>>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this 
>>>>>>>>>>>>>>>>>>>> thread Friday
>>>>>>>>>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <
>>>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it
>>>>>>>>>>>>>>>>>>>>> easily in
>>>>>>>>>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on
>>>>>>>>>>>>>>>>>>>>> this proposal so far. I
>>>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback.
>>>>>>>>>>>>>>>>>>>>> There were some larger
>>>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of
>>>>>>>>>>>>>>>>>>>>> these I have added a
>>>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised
>>>>>>>>>>>>>>>>>>>>> proposal. Please take
>>>>>>>>>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please
>>>>>>>>>>>>>>>>>>>>> take another look after
>>>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to