Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Sophie Blee-Goldman Fri, 15 May 2020 15:08:28 -0700

Hi all,

I'd like to clarify/modify one aspect of this KIP, which is to record the
staleness
at the *completion* of the record's processing by the operator/task in
question,
rather than on the intake. The task-level metrics will be recorded at the
sink
node instead of at the source, and the operator-level metrics will be
recorded
at the end of the operation.


The stated purpose and intended usefulness of this KIP is to give users a
way
to gauge roughly how long it takes for a record to be reflected in the
"results",
whether these results are being read from an output topic or through IQ. To
take the IQ example, the results of a record are obviously not visible until
*after* that node has finished processing it. The staleness, while still
potentially
useful as it can impact the *way* a record is processed in a stateful and
time-
dependent operator, is not part of the problem this KIP specifically set out
to solve.

In light of this I think it's appropriate to revert the name change back to
include
latency, since "staleness" as described above only makes sense when
measuring
relative to the arrival of a record at a task/node. I'd propose to save the
term
"staleness" for that particular meaning, and adopt Matthias's suggestion of
"record-e2e-latency" for this.

Thanks for hanging in there all. Please let me know if you have any
concerns about
this change!

Sophie

On Fri, May 15, 2020 at 2:25 PM Matthias J. Sax <mj...@apache.org> wrote:

> I am also happy with max/min/99/90. And I buy your naming argument about
> staleness vs latency.
>
>
> -Matthias
>
> On 5/15/20 12:24 PM, Boyang Chen wrote:
> > Hey Sophie,
> >
> > 90/99/min/max make sense to me.
> >
> > On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <
> sop...@confluent.io>
> > wrote:
> >
> >> @Matthias
> >> Regarding tracking the 50th percentile, I'll refer you to the 4:53 mark
> of
> >> the video
> >> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293
> >>
> >> And that's what he says about the 95th percentile! Imagine what he would
> >> say about
> >> the 50th :P
> >>
> >> But seriously, since we can't seem to agree that the mean or 50th
> >> percentile is actually
> >> useful I'm inclined to resurrect my original proposal, neither. But I
> think
> >> that's a good
> >> argument against the 75th, which I admittedly chose somewhat
> arbitrarily as
> >> an
> >> intermediate between the 50th and the higher percentiles. How about:
> >>
> >> -max
> >> -p99
> >> -p90
> >> -min
> >>
> >> with p50/mean still up for debate if anyone feels strongly for either of
> >> them.
> >>
> >> Regarding the name, I've already flip-flopped on this so I'm definitely
> >> still open to
> >> further arguments. But the reason for changing it from
> end-to-end-latency
> >> (which
> >> is similar to what you propose) is that this metric technically reflects
> >> how old (ie how "stale")
> >> the record is when it's *received* by the operator, not when it's
> processed
> >> by the operator.
> >> It seemed like there was the potential for confusion that
> >> "end-to-end-latency" might
> >> represent the latency from the event creation to the time the processor
> is
> >> done
> >> processing it.
> >>
> >> @John
> >> I'd rather err on the side of "not-enough" metrics as we can always add
> >> this to the
> >> stateless metrics later on. If we decide to measure the time at every
> node
> >> and don't
> >> find any evidence of a serious performance impact, and users indicate
> they
> >> would
> >> like to see this metric at all nodes, then we can easily start reporting
> >> them as well.
> >> WDYT?
> >>
> >> That said, sink nodes seem like a reasonable exception to the rule.
> >> Obviously users
> >> should be able to detect the time when the record reaches the output
> topic
> >> but that
> >> still leaves a gap in understanding how long the production latency was.
> >> This mirrors
> >> the consumption latency that is exposed by the task-level metrics, which
> >> are measured
> >> at the source node. For good symmetry what if we actually expose both
> the
> >> source
> >> and sink latency at the task-level? ie report both sets of statistical
> >> measurements with
> >> the additional tag -source/-sink
> >>
> >> @Bill
> >> Thanks for the comment about regarding the min! I hadn't considered that
> >> and it's
> >> quite useful to think about how and what is useful from a users point of
> >> view.
> >>
> >> Regarding your second. point, I'm inclined to leave that as an
> >> implementation detail
> >> but my take would be that the user should be allowed to control the
> record
> >> timestamp
> >> used for this with the timestamp extractor. My impression is that users
> may
> >> often embed
> >> the actual event time in the payload for whatever reason, and this
> >> represents the "true"
> >> timestamp as far as the Streams topology is concerned.
> >>
> >>
> >> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> wrote:
> >>
> >>> Thanks for the KIP, Sophie, this will be a useful metric to add.
> >>>
> >>> Regarding tracking min, I  think it could be valuable for users to
> >> discern
> >>> which part of their topologies are more efficient since this is a
> >>> task-level metric.  I realize everyone seems to be on board with
> >> including
> >>> min anyway, but I wanted to add my 2 cents on this topic should we
> decide
> >>> to revisit adding min or not.
> >>>
> >>> I do have a question regarding the calculation of staleness.
> >>> Is there going to be a consideration for timestamp extractors? Users
> >> could
> >>> prefer to use a timestamp embedded in the payload, and it could skew
> the
> >>> measurements.
> >>> I was wondering if we should specify in the KIP if setting the arrival
> >> time
> >>> is always going to come from the record timestamp, or is this an
> >>> implementation detail we can cover in the PR?
> >>>
> >>> Thanks!
> >>> Bill
> >>>
> >>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org>
> >> wrote:
> >>>
> >>>> Thanks for the KIP Sophie.
> >>>>
> >>>> I think it's not useful to record the avg/mean; it sensitive to
> >>>> outliers. We should rather track the median (50th percentile).
> >>>>
> >>>> Not sure if tracking min is useful, but I am also ok to track it.
> >>>>
> >>>> However, I find it odd to track 75th percentile. Standard measures
> >> would
> >>>> the 90th or 95th -- I guess we don't need both, so maybe picking 90th
> >>>> might be more useful?
> >>>>
> >>>> About the name: "staleness" wound really odd, and if fact the metric
> >>>> does capture "latency" so we should call it "latency". I understand
> the
> >>>> issue that we already have a latency metric. So maybe we could call it
> >>>> `record-e2e-latency-*` ?
> >>>>
> >>>> While I agree that we should include out-or-order data (the KIP should
> >>>> talk about `out-of-order` data, not `late` data; data is only `late`
> if
> >>>> it's out-of-order and if it's dropped), I don't really understand why
> >>>> the new metric would help to configure grace period or retention time?
> >>>> As you mention in the KIP, both are define as max difference of
> >>>> `event-time - stream-time` and thus the new metric that takes
> >>>> system-/wallclock-time into account does not seem to help at all.
> >>>>
> >>>>
> >>>> Btw: there is a great talk about "How NOT to Measure Latency" by Gil
> >>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 5/14/20 7:17 PM, John Roesler wrote:
> >>>>> Hi Sophie,
> >>>>>
> >>>>> It seems like there would still be plenty of use cases for recording
> >>>>> this metric at all processors and not just stateful ones, but I'm
> >> happy
> >>>>> to suspend my arguments for now. Since you're proposing to keep
> >>>>> them at the processor-node level, it will be seamless later to add
> >>>>> in the stateless processors if we want. As a wise man once said,
> >>>>> "Adding is always easier than removing."
> >>>>>
> >>>>> Regarding the time measurement, it's an implementation detail
> >>>>> we don't need to consider in the KIP. Nevertheless, I'd greatly
> >>>>> prefer to measure the system time again when recording the
> >>>>> metric. I don't think we've seen any evidence that proves this
> >>>>> would harm performance, and the amount of inaccuracy using
> >>>>> the cached system time could incur is actually substantial. But,
> >>>>> if you want to just "not mention this" in the KIP, we can defer to
> >>>>> the actual PR discussion, at which time we're in a better position
> >>>>> to use benchmarks, etc., to make the call.
> >>>>>
> >>>>> Along the lines of the measurement accuracy discussion, one
> >>>>> minor thought I had is that maybe we should consider measuring
> >>>>> the task staleness metric at the sink, rather than the source, so
> >> that
> >>>>> it includes the processing latency of the task itself, not just the
> >>>> latency
> >>>>> of everything up to, but not including, the task (which seems
> >> confusing
> >>>>> for users). I guess this could also be an implementation detail,
> >>> though.
> >>>>>
> >>>>> Thanks for the update,
> >>>>> -John
> >>>>>
> >>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote:
> >>>>>> Hey all,
> >>>>>>
> >>>>>> After discussing with Bruno I'd like to propose a small amendment,
> >>>>>> which is to record the processor-node-level metrics only for
> >>> *stateful*
> >>>>>> *operators*. They would still be considered a "processor-node-level"
> >>>>>> metric and not a "state-store-level" metric as the staleness is
> >> still
> >>>>>> a property of the node rather than of the state itself. However, it
> >>>> seems
> >>>>>> that this information is primarily useful for stateful operators
> >> that
> >>>> might
> >>>>>> be exposing state via IQ or otherwise dependent on the record time
> >>>>>> unlike a stateless operator.
> >>>>>>
> >>>>>> It's worth calling out that recent performance improvements to the
> >>>> metrics
> >>>>>> framework mean that we no longer fetch the system time at the
> >> operator
> >>>>>> level, but only once per task. In other words the system time is not
> >>>> updated
> >>>>>> between each process as a record flows through the subtopology, so
> >>>>>> debugging the processor-level latency via the stateleness will not
> >> be
> >>>>>> possible.Note that this doesn't mean the operator-level metrics are
> >>> not
> >>>>>> *useful* relative to the task-level metric. Upstream caching and/or
> >>>>>> suppression
> >>>>>> can still cause a record's staleness at some downstream stateful
> >>>> operator
> >>>>>> to deviate from the task-level staleness (recorded at the source
> >>> node).
> >>>>>>
> >>>>>> Please let me know if you have any concerns about this change. The
> >>>>>> KIP has been updated with the new proposal
> >>>>>>
> >>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <br...@confluent.io>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi Sophie,
> >>>>>>>
> >>>>>>> Thank you for the KIP.
> >>>>>>>
> >>>>>>> The KIP looks good to me.
> >>>>>>>
> >>>>>>> 50th percentile:
> >>>>>>> I think we do not need it now. If we need it, we can add it. Here
> >> the
> >>>>>>> old truism applies: Adding is always easier than removing.
> >>>>>>>
> >>>>>>> processor-node-level metrics:
> >>>>>>> I think it is good to have the staleness metrics also on
> >>>>>>> processor-node-level. If we do not want to record them on all
> >>>>>>> processor nodes, you could restrict the recording to stateful
> >>>>>>> processor-nodes, since those are the ones that would benefit most
> >>> from
> >>>>>>> the staleness metrics.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Bruno
> >>>>>>>
> >>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman <
> >>>> sop...@confluent.io>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Yeah, the specific reason was just to align with the current
> >>> metrics.
> >>>>>>>>
> >>>>>>>> Is it better to conform than to be right? History has a lot to say
> >>> on
> >>>>>>> that
> >>>>>>>> matter
> >>>>>>>> but I'm not sure how much of it applies to the fine details of
> >>> metrics
> >>>>>>>> naming :P
> >>>>>>>>
> >>>>>>>> More seriously, I figured if people are looking at this metric
> >>> they're
> >>>>>>>> likely to
> >>>>>>>> be looking at all the others. Then naming this one "-mean" would
> >>>> probably
> >>>>>>>> lead some to conclude that the "-avg" suffix in the other metrics
> >>> has
> >>>> a
> >>>>>>>> different meaning.
> >>>>>>>>
> >>>>>>>> As for the percentiles, I actually like p99 (and p75) better. I'll
> >>>> swap
> >>>>>>>> that out
> >>>>>>>>
> >>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <vvcep...@apache.org
> >>>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks Sophie,
> >>>>>>>>>
> >>>>>>>>> I hope this isn't too nit-picky, but is there a reason to choose
> >>>> "avg"
> >>>>>>>>> instead
> >>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be
> >> oversensitive
> >>>>>>> because
> >>>>>>>>> of the mistake I just made earlier, but it strikes me that "avg"
> >> is
> >>>>>>>>> actually
> >>>>>>>>> ambiguous, as it refers to a family of statistics, whereas "mean"
> >>> is
> >>>>>>>>> specific.
> >>>>>>>>> I see other Kafka metrics with "avg", but none with "mean"; was
> >>> that
> >>>>>>> the
> >>>>>>>>> reason? If so, I'm +1.
> >>>>>>>>>
> >>>>>>>>> Regarding the names of the percentile, I actually couldn't find
> >>> _any_
> >>>>>>> other
> >>>>>>>>> metrics that use percentile. Was there a reason to choose "99th"
> >> as
> >>>>>>> opposed
> >>>>>>>>> to "p99" or any other scheme? This is not a criticism, I'm just
> >>>>>>> primarily
> >>>>>>>>> asking
> >>>>>>>>> for consistency's sake.
> >>>>>>>>>
> >>>>>>>>> Thanks again,
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote:
> >>>>>>>>>> Alright, I can get behind adding the min metric for the sake of
> >>>>>>> pretty
> >>>>>>>>>> graphs
> >>>>>>>>>> (and trivial computation).
> >>>>>>>>>>
> >>>>>>>>>> I'm still on the fence regarding the mean (or 50th percentile)
> >>> but I
> >>>>>>> can
> >>>>>>>>> see
> >>>>>>>>>> how users might expect it and find it a bit disorienting not to
> >>>>>>> have. So
> >>>>>>>>> the
> >>>>>>>>>> updated proposed metrics are
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>    - record-staleness-max [ms]
> >>>>>>>>>>    - record-staleness-99th [ms] *(99th percentile)*
> >>>>>>>>>>    - record-staleness-75th [ms] *(75th percentile)*
> >>>>>>>>>>    - record-staleness-avg [ms] *(mean)*
> >>>>>>>>>>    - record-staleness-min [ms]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler <
> >> vvcep...@apache.org
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I
> >> guess
> >>>>>>> the
> >>>>>>>>> mean
> >>>>>>>>>>> seems more interesting to me than the median, but neither are
> >> as
> >>>>>>>>>>> interesting as the higher percentiles (99th and max).
> >>>>>>>>>>>
> >>>>>>>>>>> Min isn’t really important for any SLAs, but it does round out
> >>> the
> >>>>>>>>> mental
> >>>>>>>>>>> picture of the distribution. I’ve always graphed min along with
> >>> the
> >>>>>>>>> other
> >>>>>>>>>>> metrics to help me understand how fast the system can be, which
> >>>>>>> helps
> >>>>>>>>> in
> >>>>>>>>>>> optimization decisions. It’s also a relatively inexpensive
> >> metric
> >>>>>>> to
> >>>>>>>>>>> compute, so it might be nice to just throw it in.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote:
> >>>>>>>>>>>> G1:
> >>>>>>>>>>>> I was considering it as the "end-to-end latency *up* to the
> >>>>>>> specific
> >>>>>>>>>>> task"
> >>>>>>>>>>>> but
> >>>>>>>>>>>> I'm happy with "record-staleness" if that drives the point
> >> home
> >>>>>>>>> better.
> >>>>>>>>>>> So
> >>>>>>>>>>>> it's the
> >>>>>>>>>>>> "staleness of the record when it is received by that task" --
> >>>>>>> will
> >>>>>>>>> update
> >>>>>>>>>>>> the KIP
> >>>>>>>>>>>>
> >>>>>>>>>>>> B1/J:
> >>>>>>>>>>>> I'm struggling to imagine a case where the min would actually
> >> be
> >>>>>>>>> useful,
> >>>>>>>>>>>> rather than
> >>>>>>>>>>>> just intellectually interesting. I don't feel strongly that we
> >>>>>>>>> shouldn't
> >>>>>>>>>>>> add it, but that's
> >>>>>>>>>>>> why I didn't include it from the start. Can you enlighten me
> >>>>>>> with an
> >>>>>>>>>>>> example?
> >>>>>>>>>>>>
> >>>>>>>>>>>> I was also vaguely concerned about the overhead of adding
> >>>>>>> multiple
> >>>>>>>>>>>> percentile
> >>>>>>>>>>>> metrics. Do we have any data to indicate what kind of
> >>> performance
> >>>>>>>>> hit we
> >>>>>>>>>>>> take on
> >>>>>>>>>>>> metrics computation?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also, not to be too pedantic but the 50th percentile would be
> >>> the
> >>>>>>>>> median
> >>>>>>>>>>>> not the
> >>>>>>>>>>>> mean. Would you propose to add the mean *and* the 50th
> >>>>>>> percentile, or
> >>>>>>>>>>> just
> >>>>>>>>>>>> one
> >>>>>>>>>>>> of the two?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks all!
> >>>>>>>>>>>> Sophie
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler <
> >>>>>>> vvcep...@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Just some comments on the discussion so far:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> B2/G1:
> >>>>>>>>>>>>> In principle, it shouldn't matter whether we report "spans"
> >> or
> >>>>>>>>>>>>> "end-to-end" latency. But in practice, some of the spans are
> >>>>>>> pretty
> >>>>>>>>>>>>> difficult to really measure (like time spent waiting in the
> >>>>>>>>> topics, or
> >>>>>>>>>>>>> time from the event happening to the ETL producer choosing to
> >>>>>>> send
> >>>>>>>>> it,
> >>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In other words, it's practically easier to compute spans by
> >>>>>>>>> subtracting
> >>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies by adding
> >>>>>>> spans.
> >>>>>>>>> You
> >>>>>>>>>>>>> can even just consider that the span computation from e2e
> >>>>>>> always
> >>>>>>>>> just
> >>>>>>>>>>>>> involves subtracting two numbers, whereas computing e2e
> >> latency
> >>>>>>>>> from
> >>>>>>>>>>>>> spans involves adding _all_ the spans leading up to the end
> >> you
> >>>>>>>>> care
> >>>>>>>>>>> about.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It seems like people really prefer to have spans when they
> >> are
> >>>>>>>>>>> debugging
> >>>>>>>>>>>>> latency problems, whereas e2e latency is a more general
> >>>>>>> measurement
> >>>>>>>>>>>>> that basically every person/application cares about and
> >> should
> >>>>>>> be
> >>>>>>>>>>>>> monitoring.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Altogether, it really seem to provide more value to more
> >>>>>>> people if
> >>>>>>>>> we
> >>>>>>>>>>>>> report
> >>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I
> >> think
> >>>>>>> I
> >>>>>>>>> have
> >>>>>>>>>>> no
> >>>>>>>>>>>>> preference, I'd defer to other peoples' intuition.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> G2:
> >>>>>>>>>>>>> I think the processor-node metric is nice, since the inside
> >> of
> >>>>>>> a
> >>>>>>>>> task
> >>>>>>>>>>> can
> >>>>>>>>>>>>> introduce a significant amount of latency in some cases.
> >> Plus,
> >>>>>>>>> it's a
> >>>>>>>>>>> more
> >>>>>>>>>>>>> direct measurement, if you really wanted to know (for the
> >>>>>>> purposes
> >>>>>>>>> of
> >>>>>>>>>>> IQ
> >>>>>>>>>>>>> or something) how long it takes source events to "show up" at
> >>>>>>> the
> >>>>>>>>>>> store.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think actually recording it at every processor could be
> >>>>>>>>> expensive,
> >>>>>>>>>>> but we
> >>>>>>>>>>>>> already record a bunch of metrics at the node level.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> B1:
> >>>>>>>>>>>>> I think 50% could be reasonable to record also. Even if it's
> >> a
> >>>>>>> poor
> >>>>>>>>>>> metric
> >>>>>>>>>>>>> for operational purposes, a lot of people might expect to see
> >>>>>>>>> "mean".
> >>>>>>>>>>>>> Actually,
> >>>>>>>>>>>>> I was surprised not to see "min". Is there a reason to leave
> >> it
> >>>>>>>>> off?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I might suggest:
> >>>>>>>>>>>>> min, mean (50th), 75th, 99th, max
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> B3:
> >>>>>>>>>>>>> I agree we should include late records (though not the ones
> >> we
> >>>>>>>>> drop).
> >>>>>>>>>>>>> It may be spiky, but only when there are legitimately some
> >>>>>>> records
> >>>>>>>>>>> with a
> >>>>>>>>>>>>> high end-to-end latency, which is the whole point of these
> >>>>>>> metrics.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> That's it! I don't think I have any other feedback, other
> >> than
> >>>>>>> a
> >>>>>>>>>>> request to
> >>>>>>>>>>>>> also report "min".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> -John
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote:
> >>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1) The end-to-end latency includes both the processing
> >>>>>>> latency
> >>>>>>>>> of the
> >>>>>>>>>>>>> task
> >>>>>>>>>>>>>> and the latency spent sitting in intermediate topics. I have
> >>>>>>> a
> >>>>>>>>>>> similar
> >>>>>>>>>>>>>> feeling as Boyang mentioned above that the latency metric of
> >>>>>>> a
> >>>>>>>>> task A
> >>>>>>>>>>>>>> actually measures the latency of the sub-topology up-to but
> >>>>>>> not
> >>>>>>>>>>> including
> >>>>>>>>>>>>>> the processing of A, which is a bit weird.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe the my feeling comes from the name "latency" itself,
> >>>>>>> since
> >>>>>>>>>>> today we
> >>>>>>>>>>>>>> already have several "latency" metrics already which are
> >>>>>>>>> measuring
> >>>>>>>>>>>>> elapsed
> >>>>>>>>>>>>>> system-time for processing a record / etc, while here we are
> >>>>>>>>>>> comparing
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> system wallclock time with the record timestamp.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe we can consider renaming it as "record-staleness"
> >>>>>>> (note we
> >>>>>>>>>>> already
> >>>>>>>>>>>>>> have a "record-lateness" metric), in which case recording at
> >>>>>>> the
> >>>>>>>>>>>>>> system-time before we start processing the record sounds
> >> more
> >>>>>>>>>>> natural.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the
> >>>>>>> processor-node-level
> >>>>>>>>> DEBUG
> >>>>>>>>>>>>>> metric is worth to add, given that we already have a
> >>>>>>> task-level
> >>>>>>>>>>>>> processing
> >>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e latency is
> >>>>>>>>> similar
> >>>>>>>>>>> to
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> task-level e2e latency + task-level processing latency.
> >>>>>>>>> Personally I
> >>>>>>>>>>>>> think
> >>>>>>>>>>>>>> having a task-level record-staleness metric is sufficient.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman <
> >>>>>>>>>>>>> sop...@confluent.io>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge for
> >>>>>>> this
> >>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>> metric, as
> >>>>>>>>>>>>>>> it's presumably most useful at putting an *upper *bound on
> >>>>>>> the
> >>>>>>>>>>> latency
> >>>>>>>>>>>>> you
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles that would
> >>>>>>>>> hopefully
> >>>>>>>>>>>>> give a
> >>>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>> sense of what *most* records will experience, and what
> >>>>>>> *close
> >>>>>>>>> to
> >>>>>>>>>>> all*
> >>>>>>>>>>>>>>> records
> >>>>>>>>>>>>>>> will.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> However I'm not married to these specific numbers and
> >>>>>>> could be
> >>>>>>>>>>>>> convinced.
> >>>>>>>>>>>>>>> Would be especially interested in hearing from users on
> >>>>>>> this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop latency" in
> >>>>>>>>> this KIP
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>> can always compute it themselves by subtracting the
> >>>>>>> previous
> >>>>>>>>> node's
> >>>>>>>>>>>>>>> end-to-end latency. I guess we could do it either way since
> >>>>>>>>> you can
> >>>>>>>>>>>>> always
> >>>>>>>>>>>>>>> compute one from the other, but I think the end-to-end
> >>>>>>> latency
> >>>>>>>>>>> feels
> >>>>>>>>>>>>> more
> >>>>>>>>>>>>>>> valuable as it's main motivation is not to debug
> >>>>>>> bottlenecks
> >>>>>>>>> in the
> >>>>>>>>>>>>>>> topology but
> >>>>>>>>>>>>>>> to give users a sense of how long it takes arecord to be
> >>>>>>>>> reflected
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>> certain parts
> >>>>>>>>>>>>>>> of the topology. For example this might be useful for users
> >>>>>>>>> who are
> >>>>>>>>>>>>>>> wondering
> >>>>>>>>>>>>>>> roughly when a record that was just produced will be
> >>>>>>> included
> >>>>>>>>> in
> >>>>>>>>>>> their
> >>>>>>>>>>>>> IQ
> >>>>>>>>>>>>>>> results.
> >>>>>>>>>>>>>>> Debugging is just a nice side effect -- but maybe I didn't
> >>>>>>> make
> >>>>>>>>>>> that
> >>>>>>>>>>>>> clear
> >>>>>>>>>>>>>>> enough
> >>>>>>>>>>>>>>> in the KIP's motivation.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 3. Good question, I should address this in the KIP. The
> >>>>>>> short
> >>>>>>>>>>> answer is
> >>>>>>>>>>>>>>> "yes",
> >>>>>>>>>>>>>>> we will include late records. I added a paragraph to the
> >>>>>>> end
> >>>>>>>>> of the
> >>>>>>>>>>>>>>> Proposed
> >>>>>>>>>>>>>>> Changes section explaining the reasoning here, please let
> >>>>>>> me
> >>>>>>>>> know
> >>>>>>>>>>> if
> >>>>>>>>>>>>> you
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>> any concerns.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 4. Assuming you're referring to the existing metric
> >>>>>>>>>>> "process-latency",
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> metric
> >>>>>>>>>>>>>>> reflects the time for the literal Node#process method to
> >>>>>>> run
> >>>>>>>>>>> whereas
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>> metric
> >>>>>>>>>>>>>>> would always be measured relative to the event timestamp.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That said, the naming collision there is pretty confusing
> >>>>>>> so
> >>>>>>>>> I've
> >>>>>>>>>>>>> renamed
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I feel
> >>>>>>> better
> >>>>>>>>>>>>> reflects
> >>>>>>>>>>>>>>> the nature
> >>>>>>>>>>>>>>> of the metric anyway.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the feedback!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen <
> >>>>>>>>>>>>> reluctanthero...@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is
> >>>>>>>>> important
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>> understanding the bottleneck of the application.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> A couple of questions and ideas:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 and
> >>>>>>> max
> >>>>>>>>>>>>> percentiles?
> >>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as
> >>>>>>> well
> >>>>>>>>> in
> >>>>>>>>>>>>> production
> >>>>>>>>>>>>>>>> systems.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2. The current latency being computed is cumulative, I.E
> >>>>>>> if a
> >>>>>>>>>>> record
> >>>>>>>>>>>>> goes
> >>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) =
> >>>>>>> T(B->C) +
> >>>>>>>>>>> T(A->B) +
> >>>>>>>>>>>>>>> T(A)
> >>>>>>>>>>>>>>>> and so on, where P() represents the captured latency,
> >>>>>>> and T()
> >>>>>>>>>>>>> represents
> >>>>>>>>>>>>>>>> the time for transiting the records between two nodes,
> >>>>>>>>> including
> >>>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and
> >>>>>>>>> T(A->B)
> >>>>>>>>>>> are
> >>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if
> >>>>>>> there
> >>>>>>>>> is a
> >>>>>>>>>>>>> spike in
> >>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same
> >>>>>>> time.
> >>>>>>>>> In the
> >>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only when the
> >>>>>>> record
> >>>>>>>>> exits
> >>>>>>>>>>>>> from the
> >>>>>>>>>>>>>>>> sink as this marks the whole time this record spent
> >>>>>>> inside
> >>>>>>>>> the
> >>>>>>>>>>>>> funnel. Do
> >>>>>>>>>>>>>>>> you think we could have separate treatment for sink
> >>>>>>> nodes and
> >>>>>>>>>>> other
> >>>>>>>>>>>>>>>> nodes, so that other nodes only count the time receiving
> >>>>>>> the
> >>>>>>>>>>> record
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here, just want to
> >>>>>>>>> discuss
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> alternative to see if it is reasonable.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival records as
> >>>>>>> well,
> >>>>>>>>> they
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>> create some really spiky graphs when the out-of-order
> >>>>>>>>> records are
> >>>>>>>>>>>>>>>> interleaving with on time records. Should we also supply
> >>>>>>> a
> >>>>>>>>> smooth
> >>>>>>>>>>>>> version
> >>>>>>>>>>>>>>>> of the latency metrics, or user should just take care of
> >>>>>>> it
> >>>>>>>>> by
> >>>>>>>>>>>>> themself?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed its
> >>>>>>>>> relation
> >>>>>>>>>>>>> with our
> >>>>>>>>>>>>>>>> existing processing latency metrics, could you add some
> >>>>>>>>> context
> >>>>>>>>>>> on
> >>>>>>>>>>>>>>>> comparison and a simple `when to use which` tutorial for
> >>>>>>> the
> >>>>>>>>>>> best?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Boyang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman <
> >>>>>>>>>>>>> sop...@confluent.io
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hey all,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims
> >>>>>>> to
> >>>>>>>>> add
> >>>>>>>>>>>>> end-to-end
> >>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a look:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Sophie
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Reply via email to