Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Kenneth Knowles Fri, 13 Apr 2018 13:19:47 -0700

On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw <rober...@google.com> wrote:


> Also, the only use for payloads is because "User Counter" is currently a
> single URN, rather than using the namespacing characteristics of URNs to
> map user names onto distinct metric names.
>

Can they be URNs? I don't see value in having a "user metric" URN where you
then have to look elsewhere for what the real name is.

Kenn


A payload avoids the messiness of having to pack (and parse) arbitrary
> parameters into a name though.) If we're going to choose names that the
> system and sdks agree to have specific meanings, and to avoid accidental
> collisions, making them full-fledged documented URNs has value.
>
> Value is the "payload". Likely worth changing the name to avoid confusion
> with the payload above. It's bytes because it depends on the type. I would
> try to avoid nesting it too deeply (e.g. a payload within a payload). If we
> thing the types are generally limited, another option would be a oneof
> field (with a bytes option just in case) for transparency. There are pros
> and cons going this route.
>
> Type is what I proposed we add, instead of it being implicit in the name
> (and unknowable if one does not recognize the name). This makes things more
> open-ended and easier to evolve and work with.
>
> Entity could be generalized to Label, or LabelSet if desired. But as
> mentioned I think it makes sense to pull this out as a separate field,
> especially when it makes sense to aggregate a single named counter across
> labels as well as for a single label (e.g. execution time of composite
> transforms).
>
> - Robert
>
>
>
> On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <foeg...@google.com>
> wrote:
>
>> Hi folks -
>>
>> Before we totally go down the path of highly structured metric protos,
>> I'd like to propose considering a simple metrics interface between the SDK
>> and the runner.  Something more generic and closer to what most monitoring
>> systems would use.
>>
>> To use Spark as an example, the Metric system uses a simple metric format
>> of name, value and type to report all metrics in a single structure,
>> regardless of the source or context of the metric.
>>
>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html
>>
>> The subsystems have contracts for what metrics they will expose and how
>> they are calculated:
>>
>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html
>>
>> Codifying the system metrics in the SDK seems perfectly reasonable - no
>> reason to make the notion of metric generic at that level.  But at the
>> point the metric is leaving the SDK and going to the runner, a simpler,
>> generic encoding of the metrics might make it easier to adapt and maintain
>> system.  The generic format can include information about downstream
>> consumers, if that's useful.
>>
>> Spark supports a number of Metric Sinks - external monitoring systems.
>> If runners receive a simple list of metrics, implementing any number of
>> Sinks for Beam would be straightforward and would generally be a one time
>> implementation.  If instead all system metrics are sent embedded in a
>> highly structured, semantically meaningful structure, runner code would
>> need to be updated to support exporting the new metric. We seem to be
>> heading in the direction of "if you don't understand this metric, you can't
>> use it / export it".  But most systems seem to assume metrics are really
>> simple named values that can be handled a priori.
>>
>> So I guess my primary question is:  Is it necessary for Beam to treat
>> metrics as highly semantic, arbitrarily complex data?  Or could they
>> possibly be the sort of simple named values as they are in most monitoring
>> systems and in Spark?  With the SDK potentially providing scaffolding to
>> add meaning and structure, but simplifying that out before leaving SDK
>> code.  Is the coupling to a semantically meaningful structure between the
>> SDK and runner and necessary complexity?
>>
>> Andrea
>>
>>
>>
>> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com> wrote:
>>>
>>>>
>>>> *Thank you for this clarification. I think the table of files fits into
>>>> the model as one of type string-set (with union as aggregation). *
>>>> Its not a list of files, its a list of metadata for each file, several
>>>> pieces of data per file.
>>>>
>>>> Are you proposing that there would be separate URNs as well for each
>>>> entity being measured then, so the the URN defines the type of entity being
>>>> measured.
>>>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for
>>>> PCollection entities
>>>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for
>>>> PTransform entities
>>>>
>>>
>>> Yes. FWIW, it may not even be needed to put this in the name, e.g.
>>> execution times are never for PCollections, and even if they were it'd be
>>> semantically a very different beast (which should not re-use the same URN).
>>>
>>> *message MetricSpec {*
>>>> *  // (Required) A URN that describes the accompanying payload.*
>>>> *  // For any URN that is not recognized (by whomever is inspecting*
>>>> *  // it) the parameter payload should be treated as opaque and*
>>>> *  // passed as-is.*
>>>> *  string urn = 1;*
>>>>
>>>> *  // (Optional) The data specifying any parameters to the URN. If*
>>>> *  // the URN does not require any arguments, this may be omitted.*
>>>> *  bytes parameters_payload = 2;*
>>>>
>>>> *  // (Required) A URN that describes the type of values this metric*
>>>> *  // records (e.g. durations that should be summed).*
>>>> *}*
>>>>
>>>> *message Metric[Values] {*
>>>> * // (Required) The original requesting MetricSpec.*
>>>> * MetricSpec metric_spec = 1;*
>>>>
>>>> * // A mapping of entities to (encoded) values.*
>>>> * map<string, bytes> values;*
>>>> This ignores the non-unqiueness of entity identifiers. This is why in
>>>> my doc, I have specified the entity type and its string identifier
>>>> @Ken, I believe you have pointed this out in the past, that uniqueness
>>>> is only guaranteed within a type of entity (all PCollections), but not
>>>> between entities (A Pcollection and PTransform may have the same
>>>> identifier).
>>>>
>>>
>>> See above for why this is not an issue. The extra complexity (in protos
>>> and code), the inability to use them as map keys, and the fact that they'll
>>> be 100% redundant for all entities for a given metric convinces me that
>>> it's not worth creating and tracking an enum for the type alongside the id.
>>>
>>>
>>>> *}*
>>>>
>>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>>
>>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> To Robert's proto:
>>>>>>
>>>>>>  // A mapping of entities to (encoded) values.
>>>>>>>  map<string, bytes> values;
>>>>>>>
>>>>>>
>>>>>> Are the keys here the names of the metrics, aka what is used for URNs
>>>>>> in the doc?
>>>>>>
>>>>>>>
>>>>> They're the entities to which a metric is attached, e.g. a PTransform,
>>>>> a PCollection, or perhaps a process/worker.
>>>>>
>>>>>
>>>>>> }
>>>>>>>
>>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Agree with all of this. It echoes a thread on the doc that I was
>>>>>>>>> going to bring here. Let's keep it simple and use concrete use cases 
>>>>>>>>> to
>>>>>>>>> drive additional abstraction if/when it becomes compelling.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <bjchamb...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>>>>>>>>> supported type" didn't include new ways of aggregating ints. As long 
>>>>>>>>>> as
>>>>>>>>>> that means we have a fixed set of aggregations (that align with what 
>>>>>>>>>> what
>>>>>>>>>> users want and metrics back end support) it seems like we are doing 
>>>>>>>>>> user
>>>>>>>>>> metrics right.
>>>>>>>>>>
>>>>>>>>>> - Ben
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are
>>>>>>>>>>> used a lot but others are less mainstream so being too fine from 
>>>>>>>>>>> the start
>>>>>>>>>>> can just add complexity and bugs in impls IMHO.
>>>>>>>>>>>
>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com>
>>>>>>>>>>> a écrit :
>>>>>>>>>>>
>>>>>>>>>>>> By "type" of metric, I mean both the data types (including
>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a 
>>>>>>>>>>>> type, as
>>>>>>>>>>>> would double-distribution.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <
>>>>>>>>>>>> bjchamb...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> When you say type do you mean accumulator type, result type,
>>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of 
>>>>>>>>>>>>> sumint,
>>>>>>>>>>>>> sumlong, meanlong, etc?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <
>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and
>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road 
>>>>>>>>>>>>>> (and may
>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom 
>>>>>>>>>>>>>> metrics of
>>>>>>>>>>>>>> standard type.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <
>>>>>>>>>>>>>> bchamb...@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric
>>>>>>>>>>>>>>> types based on the fact they just weren't used enough to 
>>>>>>>>>>>>>>> justify support.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back?
>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus 
>>>>>>>>>>>>>>> any special
>>>>>>>>>>>>>>> system metrivs?
>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <
>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating
>>>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would 
>>>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field 
>>>>>>>>>>>>>>>> "type" which is
>>>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents 
>>>>>>>>>>>>>>>> of its
>>>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or 
>>>>>>>>>>>>>>>> maxing over ints
>>>>>>>>>>>>>>>> would be a typical example. Though we could pursue making this 
>>>>>>>>>>>>>>>> opaque to
>>>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and 
>>>>>>>>>>>>>>>> difficult)
>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least 
>>>>>>>>>>>>>>>> aggregate and
>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself 
>>>>>>>>>>>>>>>> understand the
>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>>>>>>>> specialization
>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as 
>>>>>>>>>>>>>>>> well.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every
>>>>>>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec 
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Robert
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <
>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics
>>>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the 
>>>>>>>>>>>>>>>>> protos a
>>>>>>>>>>>>>>>>> little bit.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The point of this change was to futureproof the
>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom 
>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>> functions for its metric updates.
>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated
>>>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an 
>>>>>>>>>>>>>>>>> approach which
>>>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without 
>>>>>>>>>>>>>>>>> deserializing them. These
>>>>>>>>>>>>>>>>> are forwarded to user provided code which then would 
>>>>>>>>>>>>>>>>> deserialize the metric
>>>>>>>>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric
>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform names 
>>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the 
>>>>>>>>>>>>>>>>> metrics as the
>>>>>>>>>>>>>>>>> entities are pulled outside of the metric.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <
>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a
>>>>>>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next 
>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before
>>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this thread 
>>>>>>>>>>>>>>>>>> Friday
>>>>>>>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <
>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it
>>>>>>>>>>>>>>>>>>> easily in
>>>>>>>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There
>>>>>>>>>>>>>>>>>>> were some larger
>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of
>>>>>>>>>>>>>>>>>>> these I have added a
>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised
>>>>>>>>>>>>>>>>>>> proposal. Please take
>>>>>>>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please
>>>>>>>>>>>>>>>>>>> take another look after
>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to