Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Kenneth Knowles Fri, 13 Apr 2018 08:31:35 -0700

Thanks for the extensive clarification and +1 to Robert's distinction. It
seems fundamental, perfectly analogous to the distinction between a
variable and its type. The universe of "things you measure" is unbounded,
while the universe of "ways to measure" is vastly smaller and separable.


The remaining debate was around "ways to measure", are maybe half a dozen
in common use and widespread agreement as to what they are and how they
work - basically what Dropwizard metrics provides.

So I still have doubts about building an abstracted extensible model for
shoehorning in "any message you want to send and collect" to an API and
concept with an established meaning and scope, especially since we only
have one or two motivating examples. OTOH these examples are esoteric
enough that it really wouldn't make sense to bake them into anything, so
they really do require a generic framework. The question is: does it make
sense to be _this_ framework?

So when you say "Perhaps there should be another word for this concept" I
have to agree. Often naming discussions can be a useless time sink, but in
this case I think are more in the realm of "should this be called
'addition' or 'exponentiation'?" where the name actually matters quite a
lot. But again, OTOH, for users as long as counters, distributions, gauges,
and the like are usable without having to know anything else or think in
terms of this abstract framework, then I can reluctantly accept calling it
"Metric" at the proto layer in order to use the plumbing we've already half
built.

To Robert's proto:

 // A mapping of entities to (encoded) values.
>  map<string, bytes> values;
>

Are the keys here the names of the metrics, aka what is used for URNs in
the doc?

Kenn



> }
>
> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com> wrote:
>>
>>> Agree with all of this. It echoes a thread on the doc that I was going
>>> to bring here. Let's keep it simple and use concrete use cases to drive
>>> additional abstraction if/when it becomes compelling.
>>>
>>> Kenn
>>>
>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <bjchamb...@gmail.com>
>>> wrote:
>>>
>>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>>> supported type" didn't include new ways of aggregating ints. As long as
>>>> that means we have a fixed set of aggregations (that align with what what
>>>> users want and metrics back end support) it seems like we are doing user
>>>> metrics right.
>>>>
>>>> - Ben
>>>>
>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Maybe leave it out until proven it is needed. ATM counters are used a
>>>>> lot but others are less mainstream so being too fine from the start can
>>>>> just add complexity and bugs in impls IMHO.
>>>>>
>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com> a
>>>>> écrit :
>>>>>
>>>>>> By "type" of metric, I mean both the data types (including their
>>>>>> encoding) and accumulator strategy. So sumint would be a type, as would
>>>>>> double-distribution.
>>>>>>
>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <bjchamb...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> When you say type do you mean accumulator type, result type, or
>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, 
>>>>>>> sumlong,
>>>>>>> meanlong, etc?
>>>>>>>
>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <rober...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Fully custom metric types is the "more speculative and difficult"
>>>>>>>> feature that I was proposing we kick down the road (and may never get 
>>>>>>>> to).
>>>>>>>> What I'm suggesting is that we support custom metrics of standard type.
>>>>>>>>
>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <bchamb...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The metric api is designed to prevent user defined metric types
>>>>>>>>> based on the fact they just weren't used enough to justify support.
>>>>>>>>>
>>>>>>>>> Is there a reason we are bringing that complexity back? Shouldn't
>>>>>>>>> we just need the ability for the standard set plus any special system
>>>>>>>>> metrivs?
>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <rober...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>
>>>>>>>>>> One thing that has occurred to me is that we're conflating the
>>>>>>>>>> idea of custom metrics and custom metric types. I would propose
>>>>>>>>>> the MetricSpec field be augmented with an additional field "type" 
>>>>>>>>>> which is
>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of its
>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing over 
>>>>>>>>>> ints
>>>>>>>>>> would be a typical example. Though we could pursue making this 
>>>>>>>>>> opaque to
>>>>>>>>>> the runner in the long run, that's a more speculative (and difficult)
>>>>>>>>>> feature to tackle. This would allow the runner to at least aggregate 
>>>>>>>>>> and
>>>>>>>>>> report/return to the SDK metrics that it did not itself understand 
>>>>>>>>>> the
>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>> specialization
>>>>>>>>>> in the runner itself for metrics that it *did* understand as well.)
>>>>>>>>>>
>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every type
>>>>>>>>>> X one would have a single URN for UserMetric and it spec would 
>>>>>>>>>> designate
>>>>>>>>>> the type and payload designate the (qualified) name.
>>>>>>>>>>
>>>>>>>>>> - Robert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <ajam...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>> I have made a revision today which is to make all metrics refer
>>>>>>>>>>> to a primary entity, so I have restructured some of the protos a 
>>>>>>>>>>> little bit.
>>>>>>>>>>>
>>>>>>>>>>> The point of this change was to futureproof the possibility of
>>>>>>>>>>> allowing custom user metrics, with custom aggregation functions for 
>>>>>>>>>>> its
>>>>>>>>>>> metric updates.
>>>>>>>>>>> Now that each metric has an aggregation_entity associated with
>>>>>>>>>>> it (e.g. PCollection, PTransform), we can design an approach which 
>>>>>>>>>>> forwards
>>>>>>>>>>> the opaque bytes metric updates, without deserializing them. These 
>>>>>>>>>>> are
>>>>>>>>>>> forwarded to user provided code which then would deserialize the 
>>>>>>>>>>> metric
>>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>>
>>>>>>>>>>> I think it has also simplified some of the URN metric protos, as
>>>>>>>>>>> they do not need to keep track of ptransform names inside 
>>>>>>>>>>> themselves now.
>>>>>>>>>>> The result is simpler structures, for the metrics as the entities 
>>>>>>>>>>> are
>>>>>>>>>>> pulled outside of the metric.
>>>>>>>>>>>
>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <ajam...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a
>>>>>>>>>>>> decision by Friday, and begin working on related PRs next week.
>>>>>>>>>>>>
>>>>>>>>>>>> Please make sure that you provide your feedback before then and
>>>>>>>>>>>> I will post the final decisions made to this thread Friday 
>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <ieme...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Nice, I created a short link so people can refer to it easily
>>>>>>>>>>>>> in
>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>>> >> have made some revisions based on the feedback. There were
>>>>>>>>>>>>> some larger
>>>>>>>>>>>>> >> questions asking about alternatives. For each of these I
>>>>>>>>>>>>> have added a
>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal.
>>>>>>>>>>>>> Please take
>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take
>>>>>>>>>>>>> another look after
>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to