Re: About the Gauge metric API

Ben Chambers Fri, 06 Apr 2018 12:23:47 -0700

Generally strong +1 to everything Bill said. I would suggest though that
the per-worker segmentation might be specified using some more general
tagging/labeling API. For instance, all of the following seem like
reasonable uses to support:


1. Gauge that is tagged with worker to get per-worker segmentation (such as
queue size, memory usage, etc.)
2. Gauge that is tagged with the "key" being processed. Would be useful for
things like how much data is buffered, where are watermark holds for each
key, etc. If processing is partitioned by key, this is strictly more
granular than per-worker.
3. Counter tagged with the "key" being processed. Would be useful for time
spent processing each key, etc.

On Fri, Apr 6, 2018 at 11:39 AM Bill Neubauer <[email protected]> wrote:

> Thanks for unraveling those themes, Pablo!
>
> 1. Seems reasonable in light of behaviors metrics backends support.
> 2. Those same backends support histogramming of data, so having integer
> types is very useful.
> 3. I believe that is the case, for the reasons I mentioned earlier, Gauges
> should only clobber previously values reported by the same entity. Two
> workers with the same gauge should not be overwriting each other's values,
> only their own. This implies per-worker segmentation.
>
>
> On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada <[email protected]> wrote:
>
>> Nobody wants to get rid of Gauges. I see that we have three separate
>> themes being discussed here, and I think it's useful to point them out and
>> address them independently:
>>
>> 1. Whether Gauges should change to hold string values.
>> 2. If Gauges are to support string values, whether Gauges should also
>> continue to have an int API.
>> 3. Whether Beam should support some sort of label / tag / worker-id for
>> aggregation of Gauges (maybe other metrics?)
>>
>> -P.
>>
>> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers <[email protected]>
>> wrote:
>>
>>> Gauges are incredibly useful for exposing the current state of the
>>> system. For instance, number of elements in a queue, current memory usage,
>>> number of RPCs in flight, etc. As mentioned above, these concepts exist in
>>> numerous systems for monitoring distributed environments, including
>>> Stackdriver Monitoring. The key to making them work is the addition of
>>> labels or tags, which as an aside are also useful for *all* metric types,
>>> not just Gauges.
>>>
>>> If Beam gets rid of Gauges, how would we go about exporting "current"
>>> values like memory usage, RPCs in flight, etc.?
>>>
>>> -- Ben
>>>
>>> On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles <[email protected]> wrote:
>>>
>>>> Just naively - the use cases that Gauge addresses seem relevant, and
>>>> the information seems feasible to gather and present. The bit that doesn't
>>>> seem to make sense is aggregating gauges by clobbering each other. So I
>>>> think that's just +1 Ben?
>>>>
>>>> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi <[email protected]>
>>>> wrote:
>>>>
>>>>> I am not opposed to removing other data types, though they are extra
>>>>> convenience for user.
>>>>>
>>>>> In Scott's example above, if the metric is a counter, what are the
>>>>> guarantees provided? E.g. would it match the global count using GBK? If
>>>>> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
>>>>> backlog for each Kafka partition/split).
>>>>>
>>>>> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> A String API makes it clear(er) that the values will not be
>>>>>> aggregated in any way across workers. I don't think retaining both APIs
>>>>>> (except for possibly some short migration period) worthwhile. On another
>>>>>> note, I still find the distributed gague API to be a bit odd in general.
>>>>>>
>>>>>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I would be in favor of replacing the existing Gauge.set(long) API
>>>>>>>> with the String version and removing the old one. This would be a 
>>>>>>>> breaking
>>>>>>>> change. However this is a relatively new API and is still marked
>>>>>>>> @Experimental. Keeping the old API would retain the potential 
>>>>>>>> confusion.
>>>>>>>> It's better to simplify the API surface: having two APIs makes it less
>>>>>>>> clear which one users should choose.
>>>>>>>
>>>>>>>
>>>>>>> Supporting additional data types sounds good. But the above states
>>>>>>> string API will replace the existing API. I do not see how string API 
>>>>>>> makes
>>>>>>> the semantics more clear.  Semantically both are same to the user.
>>>>>>>
>>>>>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ben : D
>>>>>>>>
>>>>>>>> Sure, that's reasonable. And perhaps I started the discussion in
>>>>>>>> the wrong direction. I'm not questioning the utility of Gauge metrics.
>>>>>>>>
>>>>>>>> What I'm saying is that Beam only supports integers,, but Gauges
>>>>>>>> are aggregated by dropping old values depending on their update times; 
>>>>>>>> so
>>>>>>>> it might be desirable to not restrict the data type to just integers.
>>>>>>>>
>>>>>>>> -P.
>>>>>>>>
>>>>>>>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> See for instance how gauge metrics are handled in Prometheus,
>>>>>>>>> Datadog and Stackdriver monitoring. Gauges are perfect for use in
>>>>>>>>> distributed systems, they just need to be properly labeled. Perhaps we
>>>>>>>>> should apply a default tag or allow users to specify one.
>>>>>>>>>
>>>>>>>>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Some metrics backend label the value, for instance with the
>>>>>>>>>> worker that sent it. Then the aggregation is latest per label. This 
>>>>>>>>>> makes
>>>>>>>>>> it useful for holding values such as "memory usage" that need to hold
>>>>>>>>>> current value.
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 on the proposal to support a "String" gauge.
>>>>>>>>>>>
>>>>>>>>>>> To expand a bit, the current API doesn't make it clear that the
>>>>>>>>>>> gauge value is based on local state. If a runner chooses to 
>>>>>>>>>>> parallelize a
>>>>>>>>>>> DoFn across many workers, each worker will have its own local Gauge 
>>>>>>>>>>> metric
>>>>>>>>>>> and its updates will overwrite other values. For example, from the 
>>>>>>>>>>> API it
>>>>>>>>>>> looks like you could use a gauge to implement your own element 
>>>>>>>>>>> count metric:
>>>>>>>>>>>
>>>>>>>>>>> long count = 0;
>>>>>>>>>>> @ProcessElement
>>>>>>>>>>> public void processElement(ProcessContext c) {
>>>>>>>>>>>   myGauge.set(++count);
>>>>>>>>>>>   c.output(c.element());
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> This looks correct, but each worker has their own local 'count'
>>>>>>>>>>> field, and gauge metric updates from parallel workers will 
>>>>>>>>>>> overwrite each
>>>>>>>>>>> other rather than get aggregated. So the final value would be "the 
>>>>>>>>>>> number
>>>>>>>>>>> of elements processed on one of the workers". (The correct 
>>>>>>>>>>> implementation
>>>>>>>>>>> uses a Counter metric).
>>>>>>>>>>>
>>>>>>>>>>> I would be in favor of replacing the existing Gauge.set(long)
>>>>>>>>>>> API with the String version and removing the old one. This would be 
>>>>>>>>>>> a
>>>>>>>>>>> breaking change. However this is a relatively new API and is still 
>>>>>>>>>>> marked
>>>>>>>>>>> @Experimental. Keeping the old API would retain the potential 
>>>>>>>>>>> confusion.
>>>>>>>>>>> It's better to simplify the API surface: having two APIs makes it 
>>>>>>>>>>> less
>>>>>>>>>>> clear which one users should choose.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello all,
>>>>>>>>>>>> As I was working on adding support for Gauges in Dataflow, some
>>>>>>>>>>>> noted that Gauge is a fairly unusual kind of metric for a 
>>>>>>>>>>>> distributed
>>>>>>>>>>>> environment, since many workers will report different values and 
>>>>>>>>>>>> stomp on
>>>>>>>>>>>> each other's all the time.
>>>>>>>>>>>>
>>>>>>>>>>>> We also looked at Flink and Dropwizard Gauge metrics [1][2],
>>>>>>>>>>>> and we found that these use generics, and Flink explicitly 
>>>>>>>>>>>> mentions that a
>>>>>>>>>>>> toString implementation is required[3].
>>>>>>>>>>>>
>>>>>>>>>>>> With that in mind, I'm thinking that it might make sense to 1)
>>>>>>>>>>>> expand Gauge to support string values (keep int-based API for 
>>>>>>>>>>>> backwards
>>>>>>>>>>>> compatibility), and migrate it to use string behind the covers.
>>>>>>>>>>>>
>>>>>>>>>>>> What does everyone think about this?
>>>>>>>>>>>>
>>>>>>>>>>>> Best
>>>>>>>>>>>> -P.
>>>>>>>>>>>>
>>>>>>>>>>>> 1 -
>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
>>>>>>>>>>>> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
>>>>>>>>>>>> 3 -
>>>>>>>>>>>> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
>>>>>>>>>>>> JIRA issue for Gauge metrics -
>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-1616
>>>>>>>>>>>> --
>>>>>>>>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Got feedback? http://go/swegner-feedback
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>> Got feedback? go/pabloem-feedback
>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>
>>>>>>> --
>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: About the Gauge metric API

Reply via email to