Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Sophie Blee-Goldman Wed, 13 May 2020 11:46:55 -0700

1. I felt that 50% was not a particularly useful gauge for this specific
metric, as
it's presumably most useful at putting an *upper *bound on the latency you
can
reasonably expect to see. I chose percentiles that would hopefully give a
good
sense of what *most* records will experience, and what *close to all* records
will.

However I'm not married to these specific numbers and could be convinced.
Would be especially interested in hearing from users on this.

2. I'm inclined to not include the "hop-to-hop latency" in this KIP since
users
can always compute it themselves by subtracting the previous node's
end-to-end latency. I guess we could do it either way since you can always
compute one from the other, but I think the end-to-end latency feels more
valuable as it's main motivation is not to debug bottlenecks in the
topology but
to give users a sense of how long it takes arecord to be reflected in
certain parts
of the topology. For example this might be useful for users who are
wondering
roughly when a record that was just produced will be included in their IQ
results.
Debugging is just a nice side effect -- but maybe I didn't make that clear
enough
in the KIP's motivation.

3. Good question, I should address this in the KIP. The short answer is
"yes",
we will include late records. I added a paragraph to the end of the Proposed
Changes section explaining the reasoning here, please let me know if you
have
any concerns.

4. Assuming you're referring to the existing metric "process-latency", that
metric
reflects the time for the literal Node#process method to run whereas this
metric
would always be measured relative to the event timestamp.

That said, the naming collision there is pretty confusing so I've renamed
the
metrics in this KIP to "end-to-end-latency" which I feel better reflects
the nature
of the metric anyway.

Thanks for the feedback!

On Wed, May 13, 2020 at 10:21 AM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> Thanks for the KIP Sophie. Getting the E2E latency is important for
> understanding the bottleneck of the application.
>
> A couple of questions and ideas:
>
> 1. Could you clarify the rational of picking 75, 99 and max percentiles?
> Normally I see cases where we use 50, 90 percentile as well in production
> systems.
>
> 2. The current latency being computed is cumulative, I.E if a record goes
> through A -> B -> C, then P(C) = T(B->C) + P(B) = T(B->C) + T(A->B) + T(A)
> and so on, where P() represents the captured latency, and T() represents
> the time for transiting the records between two nodes, including processing
> time. For monitoring purpose, maybe having T(B->C) and T(A->B) are more
> natural to view as "hop-to-hop latency", otherwise if there is a spike in
> T(A->B), both P(B) and P(C) are affected in the same time. In the same
> spirit, the E2E latency is meaningful only when the record exits from the
> sink as this marks the whole time this record spent inside the funnel. Do
> you think we could have separate treatment for sink nodes and other
> nodes, so that other nodes only count the time receiving the record from
> last hop? I'm not proposing a solution here, just want to discuss this
> alternative to see if it is reasonable.
>
> 3. As we are going to monitor late arrival records as well, they would
> create some really spiky graphs when the out-of-order records are
> interleaving with on time records. Should we also supply a smooth version
> of the latency metrics, or user should just take care of it by themself?
>
> 4. Regarding this new metrics, we haven't discussed its relation with our
> existing processing latency metrics, could you add some context on
> comparison and a simple `when to use which` tutorial for the best?
>
> Boyang
>
> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman <sop...@confluent.io>
> wrote:
>
> > Hey all,
> >
> > I'd like to kick off discussion on KIP-613 which aims to add end-to-end
> > latency metrics to Streams. Please take a look:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
> >
> > Cheers,
> > Sophie
> >
>

Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Reply via email to