Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Robert Bradshaw Fri, 13 Apr 2018 09:15:06 -0700

On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <[email protected]> wrote:


>
> To Robert's proto:
>
>  // A mapping of entities to (encoded) values.
>>  map<string, bytes> values;
>>
>
> Are the keys here the names of the metrics, aka what is used for URNs in
> the doc?
>
>>
They're the entities to which a metric is attached, e.g. a PTransform, a
PCollection, or perhaps a process/worker.


> }
>>
>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <[email protected]> wrote:
>>>
>>>> Agree with all of this. It echoes a thread on the doc that I was going
>>>> to bring here. Let's keep it simple and use concrete use cases to drive
>>>> additional abstraction if/when it becomes compelling.
>>>>
>>>> Kenn
>>>>
>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <[email protected]>
>>>> wrote:
>>>>
>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>>>> supported type" didn't include new ways of aggregating ints. As long as
>>>>> that means we have a fixed set of aggregations (that align with what what
>>>>> users want and metrics back end support) it seems like we are doing user
>>>>> metrics right.
>>>>>
>>>>> - Ben
>>>>>
>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Maybe leave it out until proven it is needed. ATM counters are used a
>>>>>> lot but others are less mainstream so being too fine from the start can
>>>>>> just add complexity and bugs in impls IMHO.
>>>>>>
>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <[email protected]> a
>>>>>> écrit :
>>>>>>
>>>>>>> By "type" of metric, I mean both the data types (including their
>>>>>>> encoding) and accumulator strategy. So sumint would be a type, as would
>>>>>>> double-distribution.
>>>>>>>
>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> When you say type do you mean accumulator type, result type, or
>>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, 
>>>>>>>> sumlong,
>>>>>>>> meanlong, etc?
>>>>>>>>
>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Fully custom metric types is the "more speculative and difficult"
>>>>>>>>> feature that I was proposing we kick down the road (and may never get 
>>>>>>>>> to).
>>>>>>>>> What I'm suggesting is that we support custom metrics of standard 
>>>>>>>>> type.
>>>>>>>>>
>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The metric api is designed to prevent user defined metric types
>>>>>>>>>> based on the fact they just weren't used enough to justify support.
>>>>>>>>>>
>>>>>>>>>> Is there a reason we are bringing that complexity back? Shouldn't
>>>>>>>>>> we just need the ability for the standard set plus any special system
>>>>>>>>>> metrivs?
>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>>
>>>>>>>>>>> One thing that has occurred to me is that we're conflating the
>>>>>>>>>>> idea of custom metrics and custom metric types. I would propose
>>>>>>>>>>> the MetricSpec field be augmented with an additional field "type" 
>>>>>>>>>>> which is
>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of its
>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing 
>>>>>>>>>>> over ints
>>>>>>>>>>> would be a typical example. Though we could pursue making this 
>>>>>>>>>>> opaque to
>>>>>>>>>>> the runner in the long run, that's a more speculative (and 
>>>>>>>>>>> difficult)
>>>>>>>>>>> feature to tackle. This would allow the runner to at least 
>>>>>>>>>>> aggregate and
>>>>>>>>>>> report/return to the SDK metrics that it did not itself understand 
>>>>>>>>>>> the
>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>>> specialization
>>>>>>>>>>> in the runner itself for metrics that it *did* understand as well.)
>>>>>>>>>>>
>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every type
>>>>>>>>>>> X one would have a single URN for UserMetric and it spec would 
>>>>>>>>>>> designate
>>>>>>>>>>> the type and payload designate the (qualified) name.
>>>>>>>>>>>
>>>>>>>>>>> - Robert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>>> I have made a revision today which is to make all metrics refer
>>>>>>>>>>>> to a primary entity, so I have restructured some of the protos a 
>>>>>>>>>>>> little bit.
>>>>>>>>>>>>
>>>>>>>>>>>> The point of this change was to futureproof the possibility of
>>>>>>>>>>>> allowing custom user metrics, with custom aggregation functions 
>>>>>>>>>>>> for its
>>>>>>>>>>>> metric updates.
>>>>>>>>>>>> Now that each metric has an aggregation_entity associated with
>>>>>>>>>>>> it (e.g. PCollection, PTransform), we can design an approach which 
>>>>>>>>>>>> forwards
>>>>>>>>>>>> the opaque bytes metric updates, without deserializing them. These 
>>>>>>>>>>>> are
>>>>>>>>>>>> forwarded to user provided code which then would deserialize the 
>>>>>>>>>>>> metric
>>>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>>>
>>>>>>>>>>>> I think it has also simplified some of the URN metric protos,
>>>>>>>>>>>> as they do not need to keep track of ptransform names inside 
>>>>>>>>>>>> themselves
>>>>>>>>>>>> now. The result is simpler structures, for the metrics as the 
>>>>>>>>>>>> entities are
>>>>>>>>>>>> pulled outside of the metric.
>>>>>>>>>>>>
>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a
>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next week.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please make sure that you provide your feedback before then
>>>>>>>>>>>>> and I will post the final decisions made to this thread Friday 
>>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it easily
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There were
>>>>>>>>>>>>>> some larger
>>>>>>>>>>>>>> >> questions asking about alternatives. For each of these I
>>>>>>>>>>>>>> have added a
>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal.
>>>>>>>>>>>>>> Please take
>>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take
>>>>>>>>>>>>>> another look after
>>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to