Thanks for bringing this to the list; it's a good question.

I think the difficulty comes from trying to statically define a lists
of possibilities that should instead be runtime values. E.g. we
currently we're up to about a dozen distinct types, and having a
setter for each is both verbose and not very extensible (especially
replicated cross language). I'm not sure the set of possible labels is
fixed either. Generally in the FnAPI we've been using shared constants
for this kind of thing instead. (I was wary about the protos for the
same reasons; it would be good to avoid leaking this through to each
of the various SDKs.) The amount of static typing/validation one gets
depends on how much logic you build into each of these methods (e.g.
does it (almost?) always "metric" which is assumed to already be
encoded correctly, or a specific type that has tradeoffs with the
amount you can do generically (e.g. we have code that first loops over
counters, then over distributions, then over gauges, and I don't think
we want continue that pattern all M places for all N types)).

I would probably lean towards mostly doing runtime validation here.
Specifically, one would have a data file that defines, for each known
URN, its type along with the set of permitted/expected/required
labels. On construction a MonitoringInfo data point, one would provide
a URN + value + labelMap, and optionally a type. On construction, the
constructor (method, factory, whatever) would look up the URN to
determine the type (or throw an error if it's both not known and not
provided), which could be then used to fetch an encoder of sorts (that
can go from value <-> proto encoding, possibly with some validation).
If labels and/or annotations are provided and the URN is known, we can
validate these as well.

As for proto/enums vs. yaml, the former is nice because code
generation comes for free, but has turned out to be much more verbose
(both the definition and the use) and I'm still on the fence whether
it's a net win.

(Unfortunately AutoValue won't help much here, as the ultimate goal is
to wrap a very nested proto OneOf, ideally with some validation.
(Also, in Python, generally, having optional, named arguments makes
this kind of builder pattern less needed.))

On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles <[email protected]> wrote:
>
> FWIW AutoValue will build most of that class for you, if it is as you say.
>
> Kenn
>
> On Tue, Oct 23, 2018 at 6:04 PM Alex Amato <[email protected]> wrote:
>>
>> Hi Robert + beam dev list,
>>
>> I was thinking about your feedback in PR#6205, and agree that this 
>> monitoring_infos.py became a bit big.
>>
>> I'm working on the Java Implementation of this now, and want to incorporate 
>> some of these ideas and improve on this.
>>
>> I that that I should make something like a MonitoringInfoBuilder class. With 
>> a few methods
>>
>> setUrn
>> setTimestamp
>> setting the value (One method for each Type we support. Setting this will 
>> also set the type string)
>>
>> setInt64CounterValue
>> setDoubleCounterValue
>> setLatestInt64
>> setTopNInt64
>> setMonitoringDataTable
>> setDistributionInt64
>> ...
>>
>> setting labels (will set the key and value properly for the label)
>>
>> setPTransform(value)
>> setPcollection(value)
>> ...
>>
>>
>> I think this will make building a metric much easier, you would just call 
>> the 4 methods and the .build(). These builders are common in Java. (I guess 
>> there is a similar thing way we could do in python? I'd like to go back and 
>> refactor that as well when I am done)
>>
>> -------------
>>
>> As for your other suggestion to define metrics in the proto/enum file 
>> instead of the yaml file. I am not too sure about the best strategy for 
>> this. My initial thoughts are:
>>
>> Make a proto extension allowing you to describe/define a MonitoringInfo's 
>> (the same info as the metric_definitions.yaml file):
>>
>> URN
>> Type
>> Labels required
>> Annotations: Description, Units, etc.
>>
>> Make the builder read in that MonitoringInfo definision/description assert 
>> everything is set properly? I think this would be a decent data driven 
>> approach.
>>
>> I was wondering if you had something else in mind?
>>
>> Thanks
>> Alex
>>
>>

Reply via email to