There are cases where we may want to update a Gauge frequently. Especially if it's from a single worker. The example I always think of is progress in a Kafka partition. This is consumed by a single worker at a time, and updated by a single worker at a time - and frequently updated too.
Max, do you have a use case in mind? I think what you propose is reasonable, but I can't think of a use case that requires updates that are much more frequent than a few seconds? I also worry about increasing the size of the SDK updates to a potentially huge amount (what if someone writes a DoFn that updates a gauge on every element?) On Tue, Oct 15, 2019 at 10:23 AM Alex Amato <ajam...@google.com> wrote: > Would you elaborate on what you are expecting the behaviour to look like? > Ideally your runner would export gauges at a periodic interval. > > The design of gauge is inherently unable to handle multiple updates to it > around the same time. > > Consider the case of multiple machines reporting the gauge at the same > time. You can pick the one with the largest timestamp on each machine. Then > when reported to a central metric service, it cannot compare timestamps in > a meaningful way, since they come from different machines with out of sync > clocks. Racy threads can be an issue as well (multiple bundles reporting > separate values for the gauge, the order is arbitrary based on thread > execution order even though on the same machine) > > The current thinking around this IIRC, is to try and document this and > make this clear in the usage of gauge: > > 1. Gauges should only be used for values which are updated > infrequently. > 2. Different gauge values reported from different workers near the > same time cannot be reliably aggregated together into a single, "most > recent" value. > > > > > > > On Tue, Oct 15, 2019 at 9:55 AM Maximilian Michels <m...@apache.org> wrote: > >> Hi, >> >> While adding metrics for the Python state cache [1], I was wondering >> about the story of Gauges in Beam. It seems like we only keep a value at >> a time and use a combiner [2] that replaces an old, possibly not >> reported gauge result, with a newer gauge result based on their >> timestamps. >> >> This behavior is an issue because if the SDK reports faster than the >> Runner queries, metrics will just be swallowed. Gauges seem important to >> get right because often users want to see all the values, e.g. in case >> of spikes in the data. >> >> What do you think about keeping all gauge values until they are reported? >> >> Thanks, >> Max >> >> [1] https://github.com/apache/beam/pull/9769 >> [2] >> >> https://github.com/apache/beam/blob/fa74467b82e78962e9f170ad0d95fa6b345add67/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java#L134 >> >