Re: Gauge Metrics

Pablo Estrada Tue, 15 Oct 2019 11:29:16 -0700

There are cases where we may want to update a Gauge frequently. Especially
if it's from a single worker. The example I always think of is progress in
a Kafka partition. This is consumed by a single worker at a time, and
updated by a single worker at a time - and frequently updated too.


Max, do you have a use case in mind? I think what you propose is
reasonable, but I can't think of a use case that requires updates that are
much more frequent than a few seconds?
I also worry about increasing the size of the SDK updates to a potentially
huge amount (what if someone writes a DoFn that updates a gauge on every
element?)

On Tue, Oct 15, 2019 at 10:23 AM Alex Amato <ajam...@google.com> wrote:

> Would you elaborate on what you are expecting the behaviour to look like?
> Ideally your runner would export gauges at a periodic interval.
>
> The design of gauge is inherently unable to handle multiple updates to it
> around the same time.
>
> Consider the case of multiple machines reporting the gauge at the same
> time. You can pick the one with the largest timestamp on each machine. Then
> when reported to a central metric service, it cannot compare timestamps in
> a meaningful way, since they come from different machines with out of sync
> clocks. Racy threads can be an issue as well (multiple bundles reporting
> separate values for the gauge, the order is arbitrary based on thread
> execution order even though on the same machine)
>
> The current thinking around this IIRC, is to try and document this and
> make this clear in the usage of gauge:
>
>    1. Gauges should only be used for values which are updated
>    infrequently.
>    2. Different gauge values reported from different workers near the
>    same time cannot be reliably aggregated together into a single, "most
>    recent" value.
>
>
>
>
>
>
> On Tue, Oct 15, 2019 at 9:55 AM Maximilian Michels <m...@apache.org> wrote:
>
>> Hi,
>>
>> While adding metrics for the Python state cache [1], I was wondering
>> about the story of Gauges in Beam. It seems like we only keep a value at
>> a time and use a combiner [2] that replaces an old, possibly not
>> reported gauge result, with a newer gauge result based on their
>> timestamps.
>>
>> This behavior is an issue because if the SDK reports faster than the
>> Runner queries, metrics will just be swallowed. Gauges seem important to
>> get right because often users want to see all the values, e.g. in case
>> of spikes in the data.
>>
>> What do you think about keeping all gauge values until they are reported?
>>
>> Thanks,
>> Max
>>
>> [1] https://github.com/apache/beam/pull/9769
>> [2]
>>
>> https://github.com/apache/beam/blob/fa74467b82e78962e9f170ad0d95fa6b345add67/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java#L134
>>
>

Re: Gauge Metrics

Reply via email to