Hi folks -

Before we totally go down the path of highly structured metric protos, I'd
like to propose considering a simple metrics interface between the SDK and
the runner.  Something more generic and closer to what most monitoring
systems would use.

To use Spark as an example, the Metric system uses a simple metric format
of name, value and type to report all metrics in a single structure,
regardless of the source or context of the metric.

The subsystems have contracts for what metrics they will expose and how
they are calculated:

Codifying the system metrics in the SDK seems perfectly reasonable - no
reason to make the notion of metric generic at that level.  But at the
point the metric is leaving the SDK and going to the runner, a simpler,
generic encoding of the metrics might make it easier to adapt and maintain
system.  The generic format can include information about downstream
consumers, if that's useful.

Spark supports a number of Metric Sinks - external monitoring systems.  If
runners receive a simple list of metrics, implementing any number of Sinks
for Beam would be straightforward and would generally be a one time
implementation.  If instead all system metrics are sent embedded in a
highly structured, semantically meaningful structure, runner code would
need to be updated to support exporting the new metric. We seem to be
heading in the direction of "if you don't understand this metric, you can't
use it / export it".  But most systems seem to assume metrics are really
simple named values that can be handled a priori.

So I guess my primary question is:  Is it necessary for Beam to treat
metrics as highly semantic, arbitrarily complex data?  Or could they
possibly be the sort of simple named values as they are in most monitoring
systems and in Spark?  With the SDK potentially providing scaffolding to
add meaning and structure, but simplifying that out before leaving SDK
code.  Is the coupling to a semantically meaningful structure between the
SDK and runner and necessary complexity?


On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <rober...@google.com>

> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <ajam...@google.com> wrote:
>> *Thank you for this clarification. I think the table of files fits into
>> the model as one of type string-set (with union as aggregation). *
>> Its not a list of files, its a list of metadata for each file, several
>> pieces of data per file.
>> Are you proposing that there would be separate URNs as well for each
>> entity being measured then, so the the URN defines the type of entity being
>> measured.
>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for
>> PCollection entities
>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for
>> PTransform entities
> Yes. FWIW, it may not even be needed to put this in the name, e.g.
> execution times are never for PCollections, and even if they were it'd be
> semantically a very different beast (which should not re-use the same URN).
> *message MetricSpec {*
>> *  // (Required) A URN that describes the accompanying payload.*
>> *  // For any URN that is not recognized (by whomever is inspecting*
>> *  // it) the parameter payload should be treated as opaque and*
>> *  // passed as-is.*
>> *  string urn = 1;*
>> *  // (Optional) The data specifying any parameters to the URN. If*
>> *  // the URN does not require any arguments, this may be omitted.*
>> *  bytes parameters_payload = 2;*
>> *  // (Required) A URN that describes the type of values this metric*
>> *  // records (e.g. durations that should be summed).*
>> *}*
>> *message Metric[Values] {*
>> * // (Required) The original requesting MetricSpec.*
>> * MetricSpec metric_spec = 1;*
>> * // A mapping of entities to (encoded) values.*
>> * map<string, bytes> values;*
>> This ignores the non-unqiueness of entity identifiers. This is why in my
>> doc, I have specified the entity type and its string identifier
>> @Ken, I believe you have pointed this out in the past, that uniqueness is
>> only guaranteed within a type of entity (all PCollections), but not between
>> entities (A Pcollection and PTransform may have the same identifier).
> See above for why this is not an issue. The extra complexity (in protos
> and code), the inability to use them as map keys, and the fact that they'll
> be 100% redundant for all entities for a given metric convinces me that
> it's not worth creating and tracking an enum for the type alongside the id.
>> *}*
>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <k...@google.com> wrote:
>>>> To Robert's proto:
>>>>  // A mapping of entities to (encoded) values.
>>>>>  map<string, bytes> values;
>>>> Are the keys here the names of the metrics, aka what is used for URNs
>>>> in the doc?
>>> They're the entities to which a metric is attached, e.g. a PTransform, a
>>> PCollection, or perhaps a process/worker.
>>>> }
>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <k...@google.com>
>>>>>> wrote:
>>>>>>> Agree with all of this. It echoes a thread on the doc that I was
>>>>>>> going to bring here. Let's keep it simple and use concrete use cases to
>>>>>>> drive additional abstraction if/when it becomes compelling.
>>>>>>> Kenn
>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <bjchamb...@gmail.com>
>>>>>>> wrote:
>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>>>>>>> supported type" didn't include new ways of aggregating ints. As long as
>>>>>>>> that means we have a fixed set of aggregations (that align with what 
>>>>>>>> what
>>>>>>>> users want and metrics back end support) it seems like we are doing 
>>>>>>>> user
>>>>>>>> metrics right.
>>>>>>>> - Ben
>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are
>>>>>>>>> used a lot but others are less mainstream so being too fine from the 
>>>>>>>>> start
>>>>>>>>> can just add complexity and bugs in impls IMHO.
>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <rober...@google.com> a
>>>>>>>>> écrit :
>>>>>>>>>> By "type" of metric, I mean both the data types (including their
>>>>>>>>>> encoding) and accumulator strategy. So sumint would be a type, as 
>>>>>>>>>> would
>>>>>>>>>> double-distribution.
>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <
>>>>>>>>>> bjchamb...@gmail.com> wrote:
>>>>>>>>>>> When you say type do you mean accumulator type, result type, or
>>>>>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, 
>>>>>>>>>>> sumlong,
>>>>>>>>>>> meanlong, etc?
>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <
>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>> Fully custom metric types is the "more speculative and
>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road (and 
>>>>>>>>>>>> may
>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom 
>>>>>>>>>>>> metrics of
>>>>>>>>>>>> standard type.
>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <
>>>>>>>>>>>> bchamb...@apache.org> wrote:
>>>>>>>>>>>>> The metric api is designed to prevent user defined metric
>>>>>>>>>>>>> types based on the fact they just weren't used enough to justify 
>>>>>>>>>>>>> support.
>>>>>>>>>>>>> Is there a reason we are bringing that complexity back?
>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any 
>>>>>>>>>>>>> special
>>>>>>>>>>>>> system metrivs?
>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <
>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating
>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would 
>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field 
>>>>>>>>>>>>>> "type" which is
>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of 
>>>>>>>>>>>>>> its
>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing 
>>>>>>>>>>>>>> over ints
>>>>>>>>>>>>>> would be a typical example. Though we could pursue making this 
>>>>>>>>>>>>>> opaque to
>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and 
>>>>>>>>>>>>>> difficult)
>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least 
>>>>>>>>>>>>>> aggregate and
>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself 
>>>>>>>>>>>>>> understand the
>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>>>>>> specialization
>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as 
>>>>>>>>>>>>>> well.)
>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every
>>>>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec 
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name.
>>>>>>>>>>>>>> - Robert
>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <
>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics
>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the 
>>>>>>>>>>>>>>> protos a
>>>>>>>>>>>>>>> little bit.
>>>>>>>>>>>>>>> The point of this change was to futureproof the possibility
>>>>>>>>>>>>>>> of allowing custom user metrics, with custom aggregation 
>>>>>>>>>>>>>>> functions for its
>>>>>>>>>>>>>>> metric updates.
>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated
>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an 
>>>>>>>>>>>>>>> approach which
>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without deserializing 
>>>>>>>>>>>>>>> them. These
>>>>>>>>>>>>>>> are forwarded to user provided code which then would 
>>>>>>>>>>>>>>> deserialize the metric
>>>>>>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric
>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform names 
>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the 
>>>>>>>>>>>>>>> metrics as the
>>>>>>>>>>>>>>> entities are pulled outside of the metric.
>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <
>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a
>>>>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next week.
>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before then
>>>>>>>>>>>>>>>> and I will post the final decisions made to this thread Friday 
>>>>>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <
>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it
>>>>>>>>>>>>>>>>> easily in
>>>>>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>>>>>> ajam...@google.com> wrote:
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There
>>>>>>>>>>>>>>>>> were some larger
>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of these
>>>>>>>>>>>>>>>>> I have added a
>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised
>>>>>>>>>>>>>>>>> proposal. Please take
>>>>>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take
>>>>>>>>>>>>>>>>> another look after
>>>>>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >

Reply via email to