Re: Committed vs. attempted metrics results

Amit Sela Thu, 19 Jan 2017 15:28:40 -0800

On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers <bchamb...@google.com.invalid>
wrote:


> Part of the problem here is that whether attempted or committed is "what
> matters" depends on the metric. If you're counting the number of RPCs to an
> external service, you may want all the attempts (to predict your bill). If
> you're tracking how long those RPCs took, you may want it just to be the
> committed (eg., what is the best-case time to execute your pipeline). This
> is essentially Luke's case of wanting one or the other.
>
This sounds like Metrics should have a user-defined guarantee-level.. which
might make more sense - Metrics.counter().attempted()/committed() - though
this might prove more challenging for runners to implement.

>
> Regarding the step names -- the metrics are reported using the full step
> name, which is also made unique during the Graph API. So "My
> Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there are
> multiple instances within the same composite. Specifically -- the names are
> made unique prior to recording metrics, so there are no double counts.
>
But how would the user know that ? I'm afraid this could be confusing as a
user-facing query API, and I think most users would simply name metrics
differently.

>
> On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <amitsel...@gmail.com> wrote:
>
> > I think Luke's example is interesting, but I wonder how common it
> is/would
> > be ? I'd expect failures to happen but not in a rate that would be so
> > dramatic that it'd be interesting to follow applicatively (I'd expect the
> > runner/cluster to properly monitor up time of processes/nodes
> separately).
> > And even if it is useful, I can't think of other use cases.
> >
> > I thought the idea was to "declare" the Metrics guarantee level in the
> > query API, but the more I think about it the more I tend to let it go for
> > the following reasons:
> >
> >    - Setting aside Luke's example, I think users would prefer the best
> >    guarantee a runner can provide. And on that note, I'd expect a
> > "getMetrics"
> >    API and not have to figure-out guarantees.
> >    - Programmatic querying would "break" (UnsupportedOperationExecption)
> >    portability if a program that was running with a runner that supports
> >    committed() would try to execute on a runner that only supports
> > attempted()
> >    - I know that portability is for the Pipeline and this is
> post-execution
> >    but still, call it 25% portability issue ;-) .
> >    - According to the Capability Matrix, all runners fail to provide
> >    "commit" guarantee for Aggregators. I can only speak for Spark saying
> > that
> >    supporting the Metrics API relies on the same underlying mechanism and
> > so
> >    nothing will change. I wonder about other runners, anyone plans to
> > support
> >    "commit" guarantees for Metrics soon ? having said that, not sure this
> > is a
> >    good reason not to have this as a placeholder.
> >
> > Another question for querying Metrics - querying by step could be a bit
> > tricky since a runner is expected to keep unique naming/ids for steps,
> but
> > users are supposed to be aware of this here and I'd suspect users might
> not
> > follow and if they use the same ParDo in a couple of places they'll query
> > it and it might be confusing for them to see "double counts" if they
> didn't
> > mean for that.
> >
> > Amit.
> >
> > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> <bchamb...@google.com.invalid
> > >
> > wrote:
> >
> > > Thanks for starting the discussion! I'm going to hold off saying what I
> > > think and instead just provide some background and additional
> questions,
> > > because I want to see where the discussion goes.
> > >
> > > When I first suggested the API for querying metrics I was adding it for
> > > parity with aggregators. A good first question might be does the
> pipeline
> > > result even need query methods? Runners could add them as necessary
> based
> > > on the levels of querying the support.
> > >
> > > The other desire was to make the accuracy clear. One implementation
> path
> > > was reporting metrics directly from the workers while attempting work.
> > This
> > > can overcount when retrying and may be under the actual attempts if the
> > > worker lost connectivity before reporting.
> > >
> > > Another implementation was something like a side output where the
> counts
> > > are committed as part of each bundles results, and then aggregated.
> This
> > > committed value is more accurate and represents the value that occurred
> > > along the success path of the pipeline.
> > >
> > > I suspect there are other possible implementations so trying to make an
> > API
> > > that expresses all of them is difficult. So:
> > >
> > > 1. Does pipeline result need to support querying (which is useful for
> > > programmatic consumption) or are metrics intended only to get values
> out
> > of
> > > a pipeline and into some metrics store?
> > >
> > > 2. How should pipeline results indicate the different kinds of metrics?
> > > What if a runner supported multiple kinds (eg, the runner reports both
> > > attempted and committed results)? As Luke mentions it may be useful to
> > look
> > > at both to understand how much retries affected the value.
> > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <aviem...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > While working on the implementation of metrics API in Spark runner the
> > > question of committed vs. attempted results has come up, sparking (no
> pun
> > > intended) an interesting conversation.
> > > (See API: MetricResult
> > > <
> > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > >
> > > and
> > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> > >
> > > The separation of `attempted` and `committed` metric results seems a
> bit
> > > unclear.
> > >
> > > Seeing that current implementations of aggregators in the different
> > runners
> > > do not guarantee correctness, one could assume that the metrics API
> > > implementations will also follow the same guarantees.
> > >
> > > If this is correct, then you could assume that only `attempted()`
> metrics
> > > results can be fulfilled.
> > > Would it then be better to just have a single method such as `get()` in
> > the
> > > API, and have the guarantees of each runner explained in the capability
> > > matrix / documentation?
> > >
> >
>

Re: Committed vs. attempted metrics results

Reply via email to