Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Robert Bradshaw Thu, 12 Apr 2018 21:40:52 -0700

On Thu, Apr 12, 2018 at 8:17 PM Alex Amato <[email protected]> wrote:


> I agree that there is some confusion about concepts. Here are several
> concepts which have come up in discussions, as I see them (not official
> names).
>
> *Metric*
>
>    - For the purposes of my document, I have been referring to a Metric
>    as any sort of information the SDK can send to the Runner
>       - This does not mean only quantitative, aggregated values.
>       - This can include other useful '*monitoring information*', for
>       supporting debugging/monitoring scenarios such as
>          - A table of files which are not yet finished reading, causing a
>          streaming pipeline to be blocked
>       - It has been pointed out to me, that when many people hear metric,
>    a very specific thing comes to mind, in particular quantitative,
>    aggregated values. *That is NOT what my document is limited to. I
>    consider both that type of metric, and more arbitrary 'monitoring
>    information', like a table of files with statuses in the proposal.*
>    - Perhaps there should be another word for this concept, yet I have
>    not yet come up with a good one, "monitoring information", "monitoring
>    item" perhaps.
>
>
> *Metric types/Metric classes*
>
>    - A collection of information reported on
>    ProcessBundleProgressResponse and ProcessBundleResponse from the SDK to the
>    RunnerHarness.
>       - e.g. execution time of par do functions.
>    - In my proposal they are defined by a URN and two structs which are
>    serialized into a MetricSpec and Metric bytes payload field, for requesting
>    and responding to the metrics.
>       - e.g. beam:metric:ptransform_execution_times:v1 defines the
>       information needed to describe how a ptransform
>    - All metrics which are passed across the FN API have a *metric type*
>
>
> *User metrics*
>
>    - A metric added by a pipeline writer, using an SDK API to create
>    these.
>    - In my proposal the various *UserMetric types are a Metric Type. *
>       - e.g. “urn:beam:metric:user_distribution_data:v1” and 
> “urn:beam:metric:user_counter_data:v1”
>       define two metric types for packaging these user metrics and
>       communicating them across the FN API.
>       - SDK writers would need to write code to package the user metrics
>       from SDK API calls into their associated metric types to send them 
> across
>       the FN API.
>
> *Custom metric types*
>
>    - A metric type which is not included in a catalog of first class beam
>    metrics. This can be thought of as metrics a custom engine+runner+sdk
>    (system as a whole) collects which is not part of the beam model.
>       - e.g. a closed source runner can define its own URNs and metrics,
>       extending the beam model
>          - for example an I/O source specific to a closed source
>          engine+runner+sdk may export a table of files it is reading with 
> statuses
>          as a custom metric type
>
>
> *Custom User Metrics with Custom Metric Types *
>
>    - Not proposed to support by the doc
>    - A user specified metric, written by a pipeline writer with a custom
>    metric type, likely would be implemented using a general mechanism to
>    attach the custom metric.
>    - May have a custom user specified aggregation function as well.
>
>
> *Reporting metrics to external systems such as drop wizard*
>
>    - My doc does not specifically cover this, it assumes that a runner
>    harness would be responsible for reporting metrics in formats specific to
>    those external systems, such as Drop Wizard. It assumes that the
>    URNs+Metric types provided will be specified enough so that it would be
>    possible to make such a translation.
>    - Each metric type would need to be handled in the RunnerHarness, to
>    collect and report the metric to an external system
>    - Some concern has come up about this, and if this should dictate the
>    format of the metrics which the SDK sends to the RunnerHarness of the FN
>    API, rather than using the more custom URN+payload approach.
>    - Though there could be URNs specifically designed to do this, the
>       intention of the design in the doc is to not require SDKs to give string
>       "names" to metrics, just to fill in URN payloads, and the Runner Harness
>       will pick names for metrics if needed to send to external systems.
>
> Just wanted to clarify this a bit. I hope the example of the table of
> files being a more complex metric type describes the usage of custom metric
> types. I'll update the doc with this
>

Thank you for this clarification. I think the table of files fits into the
model as one of type string-set (with union as aggregation).


> @Robert, I am not sure if you are proposing anything that is not in the
> current form of the doc.
>

Yes, I am.

Currently, the URN of the metric spec specifies both (1) the semantic
meaning of this metric (i.e. what exactly is being instrumented, whether
that be processing time seconds or output bytes) and (2) the formatting and
aggregation function of the (otherwise opaque) payload bytes.  I am
proposing we make (2) explicit via a URN in the MetricSpec as well, such
that a runner that does not need to interpret (1) to aggregate and even
report/display this data if it so chooses.

Concretely,

message MetricSpec {
  // (Required) A URN that describes the accompanying payload.
  // For any URN that is not recognized (by whomever is inspecting
  // it) the parameter payload should be treated as opaque and
  // passed as-is.
  string urn = 1;

  // (Optional) The data specifying any parameters to the URN. If
  // the URN does not require any arguments, this may be omitted.
  bytes parameters_payload = 2;

  // (Required) A URN that describes the type of values this metric
  // records (e.g. durations that should be summed).
}

message Metric[Values] {
 // (Required) The original requesting MetricSpec.
 MetricSpec metric_spec = 1;

 // A mapping of entities to (encoded) values.
 map<string, bytes> values;
}

On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <[email protected]> wrote:
>
>> Agree with all of this. It echoes a thread on the doc that I was going to
>> bring here. Let's keep it simple and use concrete use cases to drive
>> additional abstraction if/when it becomes compelling.
>>
>> Kenn
>>
>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <[email protected]>
>> wrote:
>>
>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>> supported type" didn't include new ways of aggregating ints. As long as
>>> that means we have a fixed set of aggregations (that align with what what
>>> users want and metrics back end support) it seems like we are doing user
>>> metrics right.
>>>
>>> - Ben
>>>
>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <[email protected]>
>>> wrote:
>>>
>>>> Maybe leave it out until proven it is needed. ATM counters are used a
>>>> lot but others are less mainstream so being too fine from the start can
>>>> just add complexity and bugs in impls IMHO.
>>>>
>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <[email protected]> a
>>>> écrit :
>>>>
>>>>> By "type" of metric, I mean both the data types (including their
>>>>> encoding) and accumulator strategy. So sumint would be a type, as would
>>>>> double-distribution.
>>>>>
>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> When you say type do you mean accumulator type, result type, or
>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, 
>>>>>> sumlong,
>>>>>> meanlong, etc?
>>>>>>
>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Fully custom metric types is the "more speculative and difficult"
>>>>>>> feature that I was proposing we kick down the road (and may never get 
>>>>>>> to).
>>>>>>> What I'm suggesting is that we support custom metrics of standard type.
>>>>>>>
>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The metric api is designed to prevent user defined metric types
>>>>>>>> based on the fact they just weren't used enough to justify support.
>>>>>>>>
>>>>>>>> Is there a reason we are bringing that complexity back? Shouldn't
>>>>>>>> we just need the ability for the standard set plus any special system
>>>>>>>> metrivs?
>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>
>>>>>>>>> One thing that has occurred to me is that we're conflating the
>>>>>>>>> idea of custom metrics and custom metric types. I would propose
>>>>>>>>> the MetricSpec field be augmented with an additional field "type" 
>>>>>>>>> which is
>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of its
>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing over 
>>>>>>>>> ints
>>>>>>>>> would be a typical example. Though we could pursue making this opaque 
>>>>>>>>> to
>>>>>>>>> the runner in the long run, that's a more speculative (and difficult)
>>>>>>>>> feature to tackle. This would allow the runner to at least aggregate 
>>>>>>>>> and
>>>>>>>>> report/return to the SDK metrics that it did not itself understand the
>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>> specialization
>>>>>>>>> in the runner itself for metrics that it *did* understand as well.)
>>>>>>>>>
>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every type X
>>>>>>>>> one would have a single URN for UserMetric and it spec would 
>>>>>>>>> designate the
>>>>>>>>> type and payload designate the (qualified) name.
>>>>>>>>>
>>>>>>>>> - Robert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>> I have made a revision today which is to make all metrics refer
>>>>>>>>>> to a primary entity, so I have restructured some of the protos a 
>>>>>>>>>> little bit.
>>>>>>>>>>
>>>>>>>>>> The point of this change was to futureproof the possibility of
>>>>>>>>>> allowing custom user metrics, with custom aggregation functions for 
>>>>>>>>>> its
>>>>>>>>>> metric updates.
>>>>>>>>>> Now that each metric has an aggregation_entity associated with it
>>>>>>>>>> (e.g. PCollection, PTransform), we can design an approach which 
>>>>>>>>>> forwards
>>>>>>>>>> the opaque bytes metric updates, without deserializing them. These 
>>>>>>>>>> are
>>>>>>>>>> forwarded to user provided code which then would deserialize the 
>>>>>>>>>> metric
>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>
>>>>>>>>>> I think it has also simplified some of the URN metric protos, as
>>>>>>>>>> they do not need to keep track of ptransform names inside themselves 
>>>>>>>>>> now.
>>>>>>>>>> The result is simpler structures, for the metrics as the entities are
>>>>>>>>>> pulled outside of the metric.
>>>>>>>>>>
>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a
>>>>>>>>>>> decision by Friday, and begin working on related PRs next week.
>>>>>>>>>>>
>>>>>>>>>>> Please make sure that you provide your feedback before then and
>>>>>>>>>>> I will post the final decisions made to this thread Friday 
>>>>>>>>>>> afternoon.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Nice, I created a short link so people can refer to it easily in
>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>
>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>> >> have made some revisions based on the feedback. There were
>>>>>>>>>>>> some larger
>>>>>>>>>>>> >> questions asking about alternatives. For each of these I
>>>>>>>>>>>> have added a
>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal.
>>>>>>>>>>>> Please take
>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take
>>>>>>>>>>>> another look after
>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>> >> Alex
>>>>>>>>>>>> >>
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to