Thanks, Sophie!

I think this makes perfect sense. It will be much more intuitive to use the 
metric for the stated motivation this way.  I’d be in favor of the proposal 
after this update. 

Thanks again for taking this on,
-John

On Fri, May 15, 2020, at 17:07, Sophie Blee-Goldman wrote:
> Hi all,
> 
> I'd like to clarify/modify one aspect of this KIP, which is to record the
> staleness
> at the *completion* of the record's processing by the operator/task in
> question,
> rather than on the intake. The task-level metrics will be recorded at the
> sink
> node instead of at the source, and the operator-level metrics will be
> recorded
> at the end of the operation.
> 
> The stated purpose and intended usefulness of this KIP is to give users a
> way
> to gauge roughly how long it takes for a record to be reflected in the
> "results",
> whether these results are being read from an output topic or through IQ. To
> take the IQ example, the results of a record are obviously not visible until
> *after* that node has finished processing it. The staleness, while still
> potentially
> useful as it can impact the *way* a record is processed in a stateful and
> time-
> dependent operator, is not part of the problem this KIP specifically set out
> to solve.
> 
> In light of this I think it's appropriate to revert the name change back to
> include
> latency, since "staleness" as described above only makes sense when
> measuring
> relative to the arrival of a record at a task/node. I'd propose to save the
> term
> "staleness" for that particular meaning, and adopt Matthias's suggestion of
> "record-e2e-latency" for this.
> 
> Thanks for hanging in there all. Please let me know if you have any
> concerns about
> this change!
> 
> Sophie
> 
> On Fri, May 15, 2020 at 2:25 PM Matthias J. Sax <mj...@apache.org> wrote:
> 
> > I am also happy with max/min/99/90. And I buy your naming argument about
> > staleness vs latency.
> >
> >
> > -Matthias
> >
> > On 5/15/20 12:24 PM, Boyang Chen wrote:
> > > Hey Sophie,
> > >
> > > 90/99/min/max make sense to me.
> > >
> > > On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <
> > sop...@confluent.io>
> > > wrote:
> > >
> > >> @Matthias
> > >> Regarding tracking the 50th percentile, I'll refer you to the 4:53 mark
> > of
> > >> the video
> > >> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293
> > >>
> > >> And that's what he says about the 95th percentile! Imagine what he would
> > >> say about
> > >> the 50th :P
> > >>
> > >> But seriously, since we can't seem to agree that the mean or 50th
> > >> percentile is actually
> > >> useful I'm inclined to resurrect my original proposal, neither. But I
> > think
> > >> that's a good
> > >> argument against the 75th, which I admittedly chose somewhat
> > arbitrarily as
> > >> an
> > >> intermediate between the 50th and the higher percentiles. How about:
> > >>
> > >> -max
> > >> -p99
> > >> -p90
> > >> -min
> > >>
> > >> with p50/mean still up for debate if anyone feels strongly for either of
> > >> them.
> > >>
> > >> Regarding the name, I've already flip-flopped on this so I'm definitely
> > >> still open to
> > >> further arguments. But the reason for changing it from
> > end-to-end-latency
> > >> (which
> > >> is similar to what you propose) is that this metric technically reflects
> > >> how old (ie how "stale")
> > >> the record is when it's *received* by the operator, not when it's
> > processed
> > >> by the operator.
> > >> It seemed like there was the potential for confusion that
> > >> "end-to-end-latency" might
> > >> represent the latency from the event creation to the time the processor
> > is
> > >> done
> > >> processing it.
> > >>
> > >> @John
> > >> I'd rather err on the side of "not-enough" metrics as we can always add
> > >> this to the
> > >> stateless metrics later on. If we decide to measure the time at every
> > node
> > >> and don't
> > >> find any evidence of a serious performance impact, and users indicate
> > they
> > >> would
> > >> like to see this metric at all nodes, then we can easily start reporting
> > >> them as well.
> > >> WDYT?
> > >>
> > >> That said, sink nodes seem like a reasonable exception to the rule.
> > >> Obviously users
> > >> should be able to detect the time when the record reaches the output
> > topic
> > >> but that
> > >> still leaves a gap in understanding how long the production latency was.
> > >> This mirrors
> > >> the consumption latency that is exposed by the task-level metrics, which
> > >> are measured
> > >> at the source node. For good symmetry what if we actually expose both
> > the
> > >> source
> > >> and sink latency at the task-level? ie report both sets of statistical
> > >> measurements with
> > >> the additional tag -source/-sink
> > >>
> > >> @Bill
> > >> Thanks for the comment about regarding the min! I hadn't considered that
> > >> and it's
> > >> quite useful to think about how and what is useful from a users point of
> > >> view.
> > >>
> > >> Regarding your second. point, I'm inclined to leave that as an
> > >> implementation detail
> > >> but my take would be that the user should be allowed to control the
> > record
> > >> timestamp
> > >> used for this with the timestamp extractor. My impression is that users
> > may
> > >> often embed
> > >> the actual event time in the payload for whatever reason, and this
> > >> represents the "true"
> > >> timestamp as far as the Streams topology is concerned.
> > >>
> > >>
> > >> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> wrote:
> > >>
> > >>> Thanks for the KIP, Sophie, this will be a useful metric to add.
> > >>>
> > >>> Regarding tracking min, I  think it could be valuable for users to
> > >> discern
> > >>> which part of their topologies are more efficient since this is a
> > >>> task-level metric.  I realize everyone seems to be on board with
> > >> including
> > >>> min anyway, but I wanted to add my 2 cents on this topic should we
> > decide
> > >>> to revisit adding min or not.
> > >>>
> > >>> I do have a question regarding the calculation of staleness.
> > >>> Is there going to be a consideration for timestamp extractors? Users
> > >> could
> > >>> prefer to use a timestamp embedded in the payload, and it could skew
> > the
> > >>> measurements.
> > >>> I was wondering if we should specify in the KIP if setting the arrival
> > >> time
> > >>> is always going to come from the record timestamp, or is this an
> > >>> implementation detail we can cover in the PR?
> > >>>
> > >>> Thanks!
> > >>> Bill
> > >>>
> > >>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org>
> > >> wrote:
> > >>>
> > >>>> Thanks for the KIP Sophie.
> > >>>>
> > >>>> I think it's not useful to record the avg/mean; it sensitive to
> > >>>> outliers. We should rather track the median (50th percentile).
> > >>>>
> > >>>> Not sure if tracking min is useful, but I am also ok to track it.
> > >>>>
> > >>>> However, I find it odd to track 75th percentile. Standard measures
> > >> would
> > >>>> the 90th or 95th -- I guess we don't need both, so maybe picking 90th
> > >>>> might be more useful?
> > >>>>
> > >>>> About the name: "staleness" wound really odd, and if fact the metric
> > >>>> does capture "latency" so we should call it "latency". I understand
> > the
> > >>>> issue that we already have a latency metric. So maybe we could call it
> > >>>> `record-e2e-latency-*` ?
> > >>>>
> > >>>> While I agree that we should include out-or-order data (the KIP should
> > >>>> talk about `out-of-order` data, not `late` data; data is only `late`
> > if
> > >>>> it's out-of-order and if it's dropped), I don't really understand why
> > >>>> the new metric would help to configure grace period or retention time?
> > >>>> As you mention in the KIP, both are define as max difference of
> > >>>> `event-time - stream-time` and thus the new metric that takes
> > >>>> system-/wallclock-time into account does not seem to help at all.
> > >>>>
> > >>>>
> > >>>> Btw: there is a great talk about "How NOT to Measure Latency" by Gil
> > >>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>> On 5/14/20 7:17 PM, John Roesler wrote:
> > >>>>> Hi Sophie,
> > >>>>>
> > >>>>> It seems like there would still be plenty of use cases for recording
> > >>>>> this metric at all processors and not just stateful ones, but I'm
> > >> happy
> > >>>>> to suspend my arguments for now. Since you're proposing to keep
> > >>>>> them at the processor-node level, it will be seamless later to add
> > >>>>> in the stateless processors if we want. As a wise man once said,
> > >>>>> "Adding is always easier than removing."
> > >>>>>
> > >>>>> Regarding the time measurement, it's an implementation detail
> > >>>>> we don't need to consider in the KIP. Nevertheless, I'd greatly
> > >>>>> prefer to measure the system time again when recording the
> > >>>>> metric. I don't think we've seen any evidence that proves this
> > >>>>> would harm performance, and the amount of inaccuracy using
> > >>>>> the cached system time could incur is actually substantial. But,
> > >>>>> if you want to just "not mention this" in the KIP, we can defer to
> > >>>>> the actual PR discussion, at which time we're in a better position
> > >>>>> to use benchmarks, etc., to make the call.
> > >>>>>
> > >>>>> Along the lines of the measurement accuracy discussion, one
> > >>>>> minor thought I had is that maybe we should consider measuring
> > >>>>> the task staleness metric at the sink, rather than the source, so
> > >> that
> > >>>>> it includes the processing latency of the task itself, not just the
> > >>>> latency
> > >>>>> of everything up to, but not including, the task (which seems
> > >> confusing
> > >>>>> for users). I guess this could also be an implementation detail,
> > >>> though.
> > >>>>>
> > >>>>> Thanks for the update,
> > >>>>> -John
> > >>>>>
> > >>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote:
> > >>>>>> Hey all,
> > >>>>>>
> > >>>>>> After discussing with Bruno I'd like to propose a small amendment,
> > >>>>>> which is to record the processor-node-level metrics only for
> > >>> *stateful*
> > >>>>>> *operators*. They would still be considered a "processor-node-level"
> > >>>>>> metric and not a "state-store-level" metric as the staleness is
> > >> still
> > >>>>>> a property of the node rather than of the state itself. However, it
> > >>>> seems
> > >>>>>> that this information is primarily useful for stateful operators
> > >> that
> > >>>> might
> > >>>>>> be exposing state via IQ or otherwise dependent on the record time
> > >>>>>> unlike a stateless operator.
> > >>>>>>
> > >>>>>> It's worth calling out that recent performance improvements to the
> > >>>> metrics
> > >>>>>> framework mean that we no longer fetch the system time at the
> > >> operator
> > >>>>>> level, but only once per task. In other words the system time is not
> > >>>> updated
> > >>>>>> between each process as a record flows through the subtopology, so
> > >>>>>> debugging the processor-level latency via the stateleness will not
> > >> be
> > >>>>>> possible.Note that this doesn't mean the operator-level metrics are
> > >>> not
> > >>>>>> *useful* relative to the task-level metric. Upstream caching and/or
> > >>>>>> suppression
> > >>>>>> can still cause a record's staleness at some downstream stateful
> > >>>> operator
> > >>>>>> to deviate from the task-level staleness (recorded at the source
> > >>> node).
> > >>>>>>
> > >>>>>> Please let me know if you have any concerns about this change. The
> > >>>>>> KIP has been updated with the new proposal
> > >>>>>>
> > >>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <br...@confluent.io>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Sophie,
> > >>>>>>>
> > >>>>>>> Thank you for the KIP.
> > >>>>>>>
> > >>>>>>> The KIP looks good to me.
> > >>>>>>>
> > >>>>>>> 50th percentile:
> > >>>>>>> I think we do not need it now. If we need it, we can add it. Here
> > >> the
> > >>>>>>> old truism applies: Adding is always easier than removing.
> > >>>>>>>
> > >>>>>>> processor-node-level metrics:
> > >>>>>>> I think it is good to have the staleness metrics also on
> > >>>>>>> processor-node-level. If we do not want to record them on all
> > >>>>>>> processor nodes, you could restrict the recording to stateful
> > >>>>>>> processor-nodes, since those are the ones that would benefit most
> > >>> from
> > >>>>>>> the staleness metrics.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Bruno
> > >>>>>>>
> > >>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman <
> > >>>> sop...@confluent.io>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Yeah, the specific reason was just to align with the current
> > >>> metrics.
> > >>>>>>>>
> > >>>>>>>> Is it better to conform than to be right? History has a lot to say
> > >>> on
> > >>>>>>> that
> > >>>>>>>> matter
> > >>>>>>>> but I'm not sure how much of it applies to the fine details of
> > >>> metrics
> > >>>>>>>> naming :P
> > >>>>>>>>
> > >>>>>>>> More seriously, I figured if people are looking at this metric
> > >>> they're
> > >>>>>>>> likely to
> > >>>>>>>> be looking at all the others. Then naming this one "-mean" would
> > >>>> probably
> > >>>>>>>> lead some to conclude that the "-avg" suffix in the other metrics
> > >>> has
> > >>>> a
> > >>>>>>>> different meaning.
> > >>>>>>>>
> > >>>>>>>> As for the percentiles, I actually like p99 (and p75) better. I'll
> > >>>> swap
> > >>>>>>>> that out
> > >>>>>>>>
> > >>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <vvcep...@apache.org
> > >>>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks Sophie,
> > >>>>>>>>>
> > >>>>>>>>> I hope this isn't too nit-picky, but is there a reason to choose
> > >>>> "avg"
> > >>>>>>>>> instead
> > >>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be
> > >> oversensitive
> > >>>>>>> because
> > >>>>>>>>> of the mistake I just made earlier, but it strikes me that "avg"
> > >> is
> > >>>>>>>>> actually
> > >>>>>>>>> ambiguous, as it refers to a family of statistics, whereas "mean"
> > >>> is
> > >>>>>>>>> specific.
> > >>>>>>>>> I see other Kafka metrics with "avg", but none with "mean"; was
> > >>> that
> > >>>>>>> the
> > >>>>>>>>> reason? If so, I'm +1.
> > >>>>>>>>>
> > >>>>>>>>> Regarding the names of the percentile, I actually couldn't find
> > >>> _any_
> > >>>>>>> other
> > >>>>>>>>> metrics that use percentile. Was there a reason to choose "99th"
> > >> as
> > >>>>>>> opposed
> > >>>>>>>>> to "p99" or any other scheme? This is not a criticism, I'm just
> > >>>>>>> primarily
> > >>>>>>>>> asking
> > >>>>>>>>> for consistency's sake.
> > >>>>>>>>>
> > >>>>>>>>> Thanks again,
> > >>>>>>>>> -John
> > >>>>>>>>>
> > >>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote:
> > >>>>>>>>>> Alright, I can get behind adding the min metric for the sake of
> > >>>>>>> pretty
> > >>>>>>>>>> graphs
> > >>>>>>>>>> (and trivial computation).
> > >>>>>>>>>>
> > >>>>>>>>>> I'm still on the fence regarding the mean (or 50th percentile)
> > >>> but I
> > >>>>>>> can
> > >>>>>>>>> see
> > >>>>>>>>>> how users might expect it and find it a bit disorienting not to
> > >>>>>>> have. So
> > >>>>>>>>> the
> > >>>>>>>>>> updated proposed metrics are
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>    - record-staleness-max [ms]
> > >>>>>>>>>>    - record-staleness-99th [ms] *(99th percentile)*
> > >>>>>>>>>>    - record-staleness-75th [ms] *(75th percentile)*
> > >>>>>>>>>>    - record-staleness-avg [ms] *(mean)*
> > >>>>>>>>>>    - record-staleness-min [ms]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler <
> > >> vvcep...@apache.org
> > >>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I
> > >> guess
> > >>>>>>> the
> > >>>>>>>>> mean
> > >>>>>>>>>>> seems more interesting to me than the median, but neither are
> > >> as
> > >>>>>>>>>>> interesting as the higher percentiles (99th and max).
> > >>>>>>>>>>>
> > >>>>>>>>>>> Min isn’t really important for any SLAs, but it does round out
> > >>> the
> > >>>>>>>>> mental
> > >>>>>>>>>>> picture of the distribution. I’ve always graphed min along with
> > >>> the
> > >>>>>>>>> other
> > >>>>>>>>>>> metrics to help me understand how fast the system can be, which
> > >>>>>>> helps
> > >>>>>>>>> in
> > >>>>>>>>>>> optimization decisions. It’s also a relatively inexpensive
> > >> metric
> > >>>>>>> to
> > >>>>>>>>>>> compute, so it might be nice to just throw it in.
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote:
> > >>>>>>>>>>>> G1:
> > >>>>>>>>>>>> I was considering it as the "end-to-end latency *up* to the
> > >>>>>>> specific
> > >>>>>>>>>>> task"
> > >>>>>>>>>>>> but
> > >>>>>>>>>>>> I'm happy with "record-staleness" if that drives the point
> > >> home
> > >>>>>>>>> better.
> > >>>>>>>>>>> So
> > >>>>>>>>>>>> it's the
> > >>>>>>>>>>>> "staleness of the record when it is received by that task" --
> > >>>>>>> will
> > >>>>>>>>> update
> > >>>>>>>>>>>> the KIP
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> B1/J:
> > >>>>>>>>>>>> I'm struggling to imagine a case where the min would actually
> > >> be
> > >>>>>>>>> useful,
> > >>>>>>>>>>>> rather than
> > >>>>>>>>>>>> just intellectually interesting. I don't feel strongly that we
> > >>>>>>>>> shouldn't
> > >>>>>>>>>>>> add it, but that's
> > >>>>>>>>>>>> why I didn't include it from the start. Can you enlighten me
> > >>>>>>> with an
> > >>>>>>>>>>>> example?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I was also vaguely concerned about the overhead of adding
> > >>>>>>> multiple
> > >>>>>>>>>>>> percentile
> > >>>>>>>>>>>> metrics. Do we have any data to indicate what kind of
> > >>> performance
> > >>>>>>>>> hit we
> > >>>>>>>>>>>> take on
> > >>>>>>>>>>>> metrics computation?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Also, not to be too pedantic but the 50th percentile would be
> > >>> the
> > >>>>>>>>> median
> > >>>>>>>>>>>> not the
> > >>>>>>>>>>>> mean. Would you propose to add the mean *and* the 50th
> > >>>>>>> percentile, or
> > >>>>>>>>>>> just
> > >>>>>>>>>>>> one
> > >>>>>>>>>>>> of the two?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks all!
> > >>>>>>>>>>>> Sophie
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler <
> > >>>>>>> vvcep...@apache.org>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Just some comments on the discussion so far:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> B2/G1:
> > >>>>>>>>>>>>> In principle, it shouldn't matter whether we report "spans"
> > >> or
> > >>>>>>>>>>>>> "end-to-end" latency. But in practice, some of the spans are
> > >>>>>>> pretty
> > >>>>>>>>>>>>> difficult to really measure (like time spent waiting in the
> > >>>>>>>>> topics, or
> > >>>>>>>>>>>>> time from the event happening to the ETL producer choosing to
> > >>>>>>> send
> > >>>>>>>>> it,
> > >>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> In other words, it's practically easier to compute spans by
> > >>>>>>>>> subtracting
> > >>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies by adding
> > >>>>>>> spans.
> > >>>>>>>>> You
> > >>>>>>>>>>>>> can even just consider that the span computation from e2e
> > >>>>>>> always
> > >>>>>>>>> just
> > >>>>>>>>>>>>> involves subtracting two numbers, whereas computing e2e
> > >> latency
> > >>>>>>>>> from
> > >>>>>>>>>>>>> spans involves adding _all_ the spans leading up to the end
> > >> you
> > >>>>>>>>> care
> > >>>>>>>>>>> about.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> It seems like people really prefer to have spans when they
> > >> are
> > >>>>>>>>>>> debugging
> > >>>>>>>>>>>>> latency problems, whereas e2e latency is a more general
> > >>>>>>> measurement
> > >>>>>>>>>>>>> that basically every person/application cares about and
> > >> should
> > >>>>>>> be
> > >>>>>>>>>>>>> monitoring.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Altogether, it really seem to provide more value to more
> > >>>>>>> people if
> > >>>>>>>>> we
> > >>>>>>>>>>>>> report
> > >>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I
> > >> think
> > >>>>>>> I
> > >>>>>>>>> have
> > >>>>>>>>>>> no
> > >>>>>>>>>>>>> preference, I'd defer to other peoples' intuition.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> G2:
> > >>>>>>>>>>>>> I think the processor-node metric is nice, since the inside
> > >> of
> > >>>>>>> a
> > >>>>>>>>> task
> > >>>>>>>>>>> can
> > >>>>>>>>>>>>> introduce a significant amount of latency in some cases.
> > >> Plus,
> > >>>>>>>>> it's a
> > >>>>>>>>>>> more
> > >>>>>>>>>>>>> direct measurement, if you really wanted to know (for the
> > >>>>>>> purposes
> > >>>>>>>>> of
> > >>>>>>>>>>> IQ
> > >>>>>>>>>>>>> or something) how long it takes source events to "show up" at
> > >>>>>>> the
> > >>>>>>>>>>> store.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I think actually recording it at every processor could be
> > >>>>>>>>> expensive,
> > >>>>>>>>>>> but we
> > >>>>>>>>>>>>> already record a bunch of metrics at the node level.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> B1:
> > >>>>>>>>>>>>> I think 50% could be reasonable to record also. Even if it's
> > >> a
> > >>>>>>> poor
> > >>>>>>>>>>> metric
> > >>>>>>>>>>>>> for operational purposes, a lot of people might expect to see
> > >>>>>>>>> "mean".
> > >>>>>>>>>>>>> Actually,
> > >>>>>>>>>>>>> I was surprised not to see "min". Is there a reason to leave
> > >> it
> > >>>>>>>>> off?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I might suggest:
> > >>>>>>>>>>>>> min, mean (50th), 75th, 99th, max
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> B3:
> > >>>>>>>>>>>>> I agree we should include late records (though not the ones
> > >> we
> > >>>>>>>>> drop).
> > >>>>>>>>>>>>> It may be spiky, but only when there are legitimately some
> > >>>>>>> records
> > >>>>>>>>>>> with a
> > >>>>>>>>>>>>> high end-to-end latency, which is the whole point of these
> > >>>>>>> metrics.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> That's it! I don't think I have any other feedback, other
> > >> than
> > >>>>>>> a
> > >>>>>>>>>>> request to
> > >>>>>>>>>>>>> also report "min".
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> -John
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote:
> > >>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 1) The end-to-end latency includes both the processing
> > >>>>>>> latency
> > >>>>>>>>> of the
> > >>>>>>>>>>>>> task
> > >>>>>>>>>>>>>> and the latency spent sitting in intermediate topics. I have
> > >>>>>>> a
> > >>>>>>>>>>> similar
> > >>>>>>>>>>>>>> feeling as Boyang mentioned above that the latency metric of
> > >>>>>>> a
> > >>>>>>>>> task A
> > >>>>>>>>>>>>>> actually measures the latency of the sub-topology up-to but
> > >>>>>>> not
> > >>>>>>>>>>> including
> > >>>>>>>>>>>>>> the processing of A, which is a bit weird.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Maybe the my feeling comes from the name "latency" itself,
> > >>>>>>> since
> > >>>>>>>>>>> today we
> > >>>>>>>>>>>>>> already have several "latency" metrics already which are
> > >>>>>>>>> measuring
> > >>>>>>>>>>>>> elapsed
> > >>>>>>>>>>>>>> system-time for processing a record / etc, while here we are
> > >>>>>>>>>>> comparing
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>> system wallclock time with the record timestamp.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Maybe we can consider renaming it as "record-staleness"
> > >>>>>>> (note we
> > >>>>>>>>>>> already
> > >>>>>>>>>>>>>> have a "record-lateness" metric), in which case recording at
> > >>>>>>> the
> > >>>>>>>>>>>>>> system-time before we start processing the record sounds
> > >> more
> > >>>>>>>>>>> natural.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the
> > >>>>>>> processor-node-level
> > >>>>>>>>> DEBUG
> > >>>>>>>>>>>>>> metric is worth to add, given that we already have a
> > >>>>>>> task-level
> > >>>>>>>>>>>>> processing
> > >>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e latency is
> > >>>>>>>>> similar
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>> task-level e2e latency + task-level processing latency.
> > >>>>>>>>> Personally I
> > >>>>>>>>>>>>> think
> > >>>>>>>>>>>>>> having a task-level record-staleness metric is sufficient.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Guozhang
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman <
> > >>>>>>>>>>>>> sop...@confluent.io>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge for
> > >>>>>>> this
> > >>>>>>>>>>>>> specific
> > >>>>>>>>>>>>>>> metric, as
> > >>>>>>>>>>>>>>> it's presumably most useful at putting an *upper *bound on
> > >>>>>>> the
> > >>>>>>>>>>> latency
> > >>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles that would
> > >>>>>>>>> hopefully
> > >>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>> good
> > >>>>>>>>>>>>>>> sense of what *most* records will experience, and what
> > >>>>>>> *close
> > >>>>>>>>> to
> > >>>>>>>>>>> all*
> > >>>>>>>>>>>>>>> records
> > >>>>>>>>>>>>>>> will.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> However I'm not married to these specific numbers and
> > >>>>>>> could be
> > >>>>>>>>>>>>> convinced.
> > >>>>>>>>>>>>>>> Would be especially interested in hearing from users on
> > >>>>>>> this.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop latency" in
> > >>>>>>>>> this KIP
> > >>>>>>>>>>>>> since
> > >>>>>>>>>>>>>>> users
> > >>>>>>>>>>>>>>> can always compute it themselves by subtracting the
> > >>>>>>> previous
> > >>>>>>>>> node's
> > >>>>>>>>>>>>>>> end-to-end latency. I guess we could do it either way since
> > >>>>>>>>> you can
> > >>>>>>>>>>>>> always
> > >>>>>>>>>>>>>>> compute one from the other, but I think the end-to-end
> > >>>>>>> latency
> > >>>>>>>>>>> feels
> > >>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>> valuable as it's main motivation is not to debug
> > >>>>>>> bottlenecks
> > >>>>>>>>> in the
> > >>>>>>>>>>>>>>> topology but
> > >>>>>>>>>>>>>>> to give users a sense of how long it takes arecord to be
> > >>>>>>>>> reflected
> > >>>>>>>>>>> in
> > >>>>>>>>>>>>>>> certain parts
> > >>>>>>>>>>>>>>> of the topology. For example this might be useful for users
> > >>>>>>>>> who are
> > >>>>>>>>>>>>>>> wondering
> > >>>>>>>>>>>>>>> roughly when a record that was just produced will be
> > >>>>>>> included
> > >>>>>>>>> in
> > >>>>>>>>>>> their
> > >>>>>>>>>>>>> IQ
> > >>>>>>>>>>>>>>> results.
> > >>>>>>>>>>>>>>> Debugging is just a nice side effect -- but maybe I didn't
> > >>>>>>> make
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>> clear
> > >>>>>>>>>>>>>>> enough
> > >>>>>>>>>>>>>>> in the KIP's motivation.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 3. Good question, I should address this in the KIP. The
> > >>>>>>> short
> > >>>>>>>>>>> answer is
> > >>>>>>>>>>>>>>> "yes",
> > >>>>>>>>>>>>>>> we will include late records. I added a paragraph to the
> > >>>>>>> end
> > >>>>>>>>> of the
> > >>>>>>>>>>>>>>> Proposed
> > >>>>>>>>>>>>>>> Changes section explaining the reasoning here, please let
> > >>>>>>> me
> > >>>>>>>>> know
> > >>>>>>>>>>> if
> > >>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>> any concerns.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 4. Assuming you're referring to the existing metric
> > >>>>>>>>>>> "process-latency",
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> metric
> > >>>>>>>>>>>>>>> reflects the time for the literal Node#process method to
> > >>>>>>> run
> > >>>>>>>>>>> whereas
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>> metric
> > >>>>>>>>>>>>>>> would always be measured relative to the event timestamp.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> That said, the naming collision there is pretty confusing
> > >>>>>>> so
> > >>>>>>>>> I've
> > >>>>>>>>>>>>> renamed
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I feel
> > >>>>>>> better
> > >>>>>>>>>>>>> reflects
> > >>>>>>>>>>>>>>> the nature
> > >>>>>>>>>>>>>>> of the metric anyway.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks for the feedback!
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen <
> > >>>>>>>>>>>>> reluctanthero...@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is
> > >>>>>>>>> important
> > >>>>>>>>>>> for
> > >>>>>>>>>>>>>>>> understanding the bottleneck of the application.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> A couple of questions and ideas:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 and
> > >>>>>>> max
> > >>>>>>>>>>>>> percentiles?
> > >>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as
> > >>>>>>> well
> > >>>>>>>>> in
> > >>>>>>>>>>>>> production
> > >>>>>>>>>>>>>>>> systems.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> 2. The current latency being computed is cumulative, I.E
> > >>>>>>> if a
> > >>>>>>>>>>> record
> > >>>>>>>>>>>>> goes
> > >>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) =
> > >>>>>>> T(B->C) +
> > >>>>>>>>>>> T(A->B) +
> > >>>>>>>>>>>>>>> T(A)
> > >>>>>>>>>>>>>>>> and so on, where P() represents the captured latency,
> > >>>>>>> and T()
> > >>>>>>>>>>>>> represents
> > >>>>>>>>>>>>>>>> the time for transiting the records between two nodes,
> > >>>>>>>>> including
> > >>>>>>>>>>>>>>> processing
> > >>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and
> > >>>>>>>>> T(A->B)
> > >>>>>>>>>>> are
> > >>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if
> > >>>>>>> there
> > >>>>>>>>> is a
> > >>>>>>>>>>>>> spike in
> > >>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same
> > >>>>>>> time.
> > >>>>>>>>> In the
> > >>>>>>>>>>>>> same
> > >>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only when the
> > >>>>>>> record
> > >>>>>>>>> exits
> > >>>>>>>>>>>>> from the
> > >>>>>>>>>>>>>>>> sink as this marks the whole time this record spent
> > >>>>>>> inside
> > >>>>>>>>> the
> > >>>>>>>>>>>>> funnel. Do
> > >>>>>>>>>>>>>>>> you think we could have separate treatment for sink
> > >>>>>>> nodes and
> > >>>>>>>>>>> other
> > >>>>>>>>>>>>>>>> nodes, so that other nodes only count the time receiving
> > >>>>>>> the
> > >>>>>>>>>>> record
> > >>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here, just want to
> > >>>>>>>>> discuss
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>> alternative to see if it is reasonable.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival records as
> > >>>>>>> well,
> > >>>>>>>>> they
> > >>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>> create some really spiky graphs when the out-of-order
> > >>>>>>>>> records are
> > >>>>>>>>>>>>>>>> interleaving with on time records. Should we also supply
> > >>>>>>> a
> > >>>>>>>>> smooth
> > >>>>>>>>>>>>> version
> > >>>>>>>>>>>>>>>> of the latency metrics, or user should just take care of
> > >>>>>>> it
> > >>>>>>>>> by
> > >>>>>>>>>>>>> themself?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed its
> > >>>>>>>>> relation
> > >>>>>>>>>>>>> with our
> > >>>>>>>>>>>>>>>> existing processing latency metrics, could you add some
> > >>>>>>>>> context
> > >>>>>>>>>>> on
> > >>>>>>>>>>>>>>>> comparison and a simple `when to use which` tutorial for
> > >>>>>>> the
> > >>>>>>>>>>> best?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Boyang
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman <
> > >>>>>>>>>>>>> sop...@confluent.io
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hey all,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims
> > >>>>>>> to
> > >>>>>>>>> add
> > >>>>>>>>>>>>> end-to-end
> > >>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a look:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>> Sophie
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Reply via email to