Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Robert Bradshaw Fri, 13 Apr 2018 13:08:33 -0700

+1 to keeping things simple, both in code and the model to understand.

I like thinking of things as (name, value, type) triples. Historically,
we've packed the entity name (e.g. PTransform name) into the string name
field and parsed it out in various places; I think it's worth pulling this
out and making it explicit instead, so metrics would be (name, entity,
value, type) tuples. In the current proposal:


Name is the URN + a possible bytes payload. (Actually, it's a bit unclear
if there's any relationship between counters with the same name and
different payloads. Also, the only use for payloads is because "User
Counter" is currently a single URN, rather than using the namespacing
characteristics of URNs to map user names onto distinct metric names. A
payload avoids the messiness of having to pack (and parse) arbitrary
parameters into a name though.) If we're going to choose names that the
system and sdks agree to have specific meanings, and to avoid accidental
collisions, making them full-fledged documented URNs has value.

Value is the "payload". Likely worth changing the name to avoid confusion
with the payload above. It's bytes because it depends on the type. I would
try to avoid nesting it too deeply (e.g. a payload within a payload). If we
thing the types are generally limited, another option would be a oneof
field (with a bytes option just in case) for transparency. There are pros
and cons going this route.

Type is what I proposed we add, instead of it being implicit in the name
(and unknowable if one does not recognize the name). This makes things more
open-ended and easier to evolve and work with.

Entity could be generalized to Label, or LabelSet if desired. But as
mentioned I think it makes sense to pull this out as a separate field,
especially when it makes sense to aggregate a single named counter across
labels as well as for a single label (e.g. execution time of composite
transforms).

- Robert



On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <foeg...@google.com> wrote:

> Hi folks -
>
> Before we totally go down the path of highly structured metric protos, I'd
> like to propose considering a simple metrics interface between the SDK and
> the runner.  Something more generic and closer to what most monitoring
> systems would use.
>
> To use Spark as an example, the Metric system uses a simple metric format
> of name, value and type to report all metrics in a single structure,
> regardless of the source or context of the metric.
>
> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html
>
> The subsystems have contracts for what metrics they will expose and how
> they are calculated:
>
> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html
>
> Codifying the system metrics in the SDK seems perfectly reasonable - no
> reason to make the notion of metric generic at that level.  But at the
> point the metric is leaving the SDK and going to the runner, a simpler,
> generic encoding of the metrics might make it easier to adapt and maintain
> system.  The generic format can include information about downstream
> consumers, if that's useful.
>
> Spark supports a number of Metric Sinks - external monitoring systems.  If
> runners receive a simple list of metrics, implementing any number of Sinks
> for Beam would be straightforward and would generally be a one time
> implementation.  If instead all system metrics are sent embedded in a
> highly structured, semantically meaningful structure, runner code would
> need to be updated to support exporting the new metric. We seem to be
> heading in the direction of "if you don't understand this metric, you can't
> use it / export it".  But most systems seem to assume metrics are really
> simple named values that can be handled a priori.
>
> So I guess my primary question is:  Is it necessary for Beam to treat
> metrics as highly semantic, arbitrarily complex data?  Or could they
> possibly be the sort of simple named values as they are in most monitoring
> systems and in Spark?  With the SDK potentially providing scaffolding to
> add meaning and structure, but simplifying that out before leaving SDK
> code.  Is the coupling to a semantically meaningful structure between the
> SDK and runner and necessary complexity?
>
> Andrea
>
>
>
> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com> wrote:
>>
>>>
>>> *Thank you for this clarification. I think the table of files fits into
>>> the model as one of type string-set (with union as aggregation). *
>>> Its not a list of files, its a list of metadata for each file, several
>>> pieces of data per file.
>>>
>>> Are you proposing that there would be separate URNs as well for each
>>> entity being measured then, so the the URN defines the type of entity being
>>> measured.
>>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for
>>> PCollection entities
>>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for
>>> PTransform entities
>>>
>>
>> Yes. FWIW, it may not even be needed to put this in the name, e.g.
>> execution times are never for PCollections, and even if they were it'd be
>> semantically a very different beast (which should not re-use the same URN).
>>
>> *message MetricSpec {*
>>> *  // (Required) A URN that describes the accompanying payload.*
>>> *  // For any URN that is not recognized (by whomever is inspecting*
>>> *  // it) the parameter payload should be treated as opaque and*
>>> *  // passed as-is.*
>>> *  string urn = 1;*
>>>
>>> *  // (Optional) The data specifying any parameters to the URN. If*
>>> *  // the URN does not require any arguments, this may be omitted.*
>>> *  bytes parameters_payload = 2;*
>>>
>>> *  // (Required) A URN that describes the type of values this metric*
>>> *  // records (e.g. durations that should be summed).*
>>> *}*
>>>
>>> *message Metric[Values] {*
>>> * // (Required) The original requesting MetricSpec.*
>>> * MetricSpec metric_spec = 1;*
>>>
>>> * // A mapping of entities to (encoded) values.*
>>> * map<string, bytes> values;*
>>> This ignores the non-unqiueness of entity identifiers. This is why in my
>>> doc, I have specified the entity type and its string identifier
>>> @Ken, I believe you have pointed this out in the past, that uniqueness
>>> is only guaranteed within a type of entity (all PCollections), but not
>>> between entities (A Pcollection and PTransform may have the same
>>> identifier).
>>>
>>
>> See above for why this is not an issue. The extra complexity (in protos
>> and code), the inability to use them as map keys, and the fact that they'll
>> be 100% redundant for all entities for a given metric convinces me that
>> it's not worth creating and tracking an enum for the type alongside the id.
>>
>>
>>> *}*
>>>
>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com> wrote:
>>>>
>>>>>
>>>>> To Robert's proto:
>>>>>
>>>>>  // A mapping of entities to (encoded) values.
>>>>>>  map<string, bytes> values;
>>>>>>
>>>>>
>>>>> Are the keys here the names of the metrics, aka what is used for URNs
>>>>> in the doc?
>>>>>
>>>>>>
>>>> They're the entities to which a metric is attached, e.g. a PTransform,
>>>> a PCollection, or perhaps a process/worker.
>>>>
>>>>
>>>>> }
>>>>>>
>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Agree with all of this. It echoes a thread on the doc that I was
>>>>>>>> going to bring here. Let's keep it simple and use concrete use cases to
>>>>>>>> drive additional abstraction if/when it becomes compelling.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <bjchamb...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>>>>>>>> supported type" didn't include new ways of aggregating ints. As long 
>>>>>>>>> as
>>>>>>>>> that means we have a fixed set of aggregations (that align with what 
>>>>>>>>> what
>>>>>>>>> users want and metrics back end support) it seems like we are doing 
>>>>>>>>> user
>>>>>>>>> metrics right.
>>>>>>>>>
>>>>>>>>> - Ben
>>>>>>>>>
>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are
>>>>>>>>>> used a lot but others are less mainstream so being too fine from the 
>>>>>>>>>> start
>>>>>>>>>> can just add complexity and bugs in impls IMHO.
>>>>>>>>>>
>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com> a
>>>>>>>>>> écrit :
>>>>>>>>>>
>>>>>>>>>>> By "type" of metric, I mean both the data types (including their
>>>>>>>>>>> encoding) and accumulator strategy. So sumint would be a type, as 
>>>>>>>>>>> would
>>>>>>>>>>> double-distribution.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <
>>>>>>>>>>> bjchamb...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> When you say type do you mean accumulator type, result type, or
>>>>>>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, 
>>>>>>>>>>>> sumlong,
>>>>>>>>>>>> meanlong, etc?
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <
>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Fully custom metric types is the "more speculative and
>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road 
>>>>>>>>>>>>> (and may
>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom 
>>>>>>>>>>>>> metrics of
>>>>>>>>>>>>> standard type.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <
>>>>>>>>>>>>> bchamb...@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric
>>>>>>>>>>>>>> types based on the fact they just weren't used enough to justify 
>>>>>>>>>>>>>> support.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back?
>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any 
>>>>>>>>>>>>>> special
>>>>>>>>>>>>>> system metrivs?
>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <
>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating
>>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would 
>>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field 
>>>>>>>>>>>>>>> "type" which is
>>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of 
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing 
>>>>>>>>>>>>>>> over ints
>>>>>>>>>>>>>>> would be a typical example. Though we could pursue making this 
>>>>>>>>>>>>>>> opaque to
>>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and 
>>>>>>>>>>>>>>> difficult)
>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least 
>>>>>>>>>>>>>>> aggregate and
>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself 
>>>>>>>>>>>>>>> understand the
>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>>>>>>> specialization
>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as 
>>>>>>>>>>>>>>> well.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every
>>>>>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec 
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Robert
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <
>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics
>>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the 
>>>>>>>>>>>>>>>> protos a
>>>>>>>>>>>>>>>> little bit.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The point of this change was to futureproof the possibility
>>>>>>>>>>>>>>>> of allowing custom user metrics, with custom aggregation 
>>>>>>>>>>>>>>>> functions for its
>>>>>>>>>>>>>>>> metric updates.
>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated
>>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an 
>>>>>>>>>>>>>>>> approach which
>>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without 
>>>>>>>>>>>>>>>> deserializing them. These
>>>>>>>>>>>>>>>> are forwarded to user provided code which then would 
>>>>>>>>>>>>>>>> deserialize the metric
>>>>>>>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric
>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform names 
>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the 
>>>>>>>>>>>>>>>> metrics as the
>>>>>>>>>>>>>>>> entities are pulled outside of the metric.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <
>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a
>>>>>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next 
>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before
>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this thread 
>>>>>>>>>>>>>>>>> Friday
>>>>>>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <
>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it
>>>>>>>>>>>>>>>>>> easily in
>>>>>>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There
>>>>>>>>>>>>>>>>>> were some larger
>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of these
>>>>>>>>>>>>>>>>>> I have added a
>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised
>>>>>>>>>>>>>>>>>> proposal. Please take
>>>>>>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please
>>>>>>>>>>>>>>>>>> take another look after
>>>>>>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to