Hi all, I'd like to clarify/modify one aspect of this KIP, which is to record the staleness at the *completion* of the record's processing by the operator/task in question, rather than on the intake. The task-level metrics will be recorded at the sink node instead of at the source, and the operator-level metrics will be recorded at the end of the operation.
The stated purpose and intended usefulness of this KIP is to give users a way to gauge roughly how long it takes for a record to be reflected in the "results", whether these results are being read from an output topic or through IQ. To take the IQ example, the results of a record are obviously not visible until *after* that node has finished processing it. The staleness, while still potentially useful as it can impact the *way* a record is processed in a stateful and time- dependent operator, is not part of the problem this KIP specifically set out to solve. In light of this I think it's appropriate to revert the name change back to include latency, since "staleness" as described above only makes sense when measuring relative to the arrival of a record at a task/node. I'd propose to save the term "staleness" for that particular meaning, and adopt Matthias's suggestion of "record-e2e-latency" for this. Thanks for hanging in there all. Please let me know if you have any concerns about this change! Sophie On Fri, May 15, 2020 at 2:25 PM Matthias J. Sax <mj...@apache.org> wrote: > I am also happy with max/min/99/90. And I buy your naming argument about > staleness vs latency. > > > -Matthias > > On 5/15/20 12:24 PM, Boyang Chen wrote: > > Hey Sophie, > > > > 90/99/min/max make sense to me. > > > > On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman < > sop...@confluent.io> > > wrote: > > > >> @Matthias > >> Regarding tracking the 50th percentile, I'll refer you to the 4:53 mark > of > >> the video > >> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293 > >> > >> And that's what he says about the 95th percentile! Imagine what he would > >> say about > >> the 50th :P > >> > >> But seriously, since we can't seem to agree that the mean or 50th > >> percentile is actually > >> useful I'm inclined to resurrect my original proposal, neither. But I > think > >> that's a good > >> argument against the 75th, which I admittedly chose somewhat > arbitrarily as > >> an > >> intermediate between the 50th and the higher percentiles. How about: > >> > >> -max > >> -p99 > >> -p90 > >> -min > >> > >> with p50/mean still up for debate if anyone feels strongly for either of > >> them. > >> > >> Regarding the name, I've already flip-flopped on this so I'm definitely > >> still open to > >> further arguments. But the reason for changing it from > end-to-end-latency > >> (which > >> is similar to what you propose) is that this metric technically reflects > >> how old (ie how "stale") > >> the record is when it's *received* by the operator, not when it's > processed > >> by the operator. > >> It seemed like there was the potential for confusion that > >> "end-to-end-latency" might > >> represent the latency from the event creation to the time the processor > is > >> done > >> processing it. > >> > >> @John > >> I'd rather err on the side of "not-enough" metrics as we can always add > >> this to the > >> stateless metrics later on. If we decide to measure the time at every > node > >> and don't > >> find any evidence of a serious performance impact, and users indicate > they > >> would > >> like to see this metric at all nodes, then we can easily start reporting > >> them as well. > >> WDYT? > >> > >> That said, sink nodes seem like a reasonable exception to the rule. > >> Obviously users > >> should be able to detect the time when the record reaches the output > topic > >> but that > >> still leaves a gap in understanding how long the production latency was. > >> This mirrors > >> the consumption latency that is exposed by the task-level metrics, which > >> are measured > >> at the source node. For good symmetry what if we actually expose both > the > >> source > >> and sink latency at the task-level? ie report both sets of statistical > >> measurements with > >> the additional tag -source/-sink > >> > >> @Bill > >> Thanks for the comment about regarding the min! I hadn't considered that > >> and it's > >> quite useful to think about how and what is useful from a users point of > >> view. > >> > >> Regarding your second. point, I'm inclined to leave that as an > >> implementation detail > >> but my take would be that the user should be allowed to control the > record > >> timestamp > >> used for this with the timestamp extractor. My impression is that users > may > >> often embed > >> the actual event time in the payload for whatever reason, and this > >> represents the "true" > >> timestamp as far as the Streams topology is concerned. > >> > >> > >> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> wrote: > >> > >>> Thanks for the KIP, Sophie, this will be a useful metric to add. > >>> > >>> Regarding tracking min, I think it could be valuable for users to > >> discern > >>> which part of their topologies are more efficient since this is a > >>> task-level metric. I realize everyone seems to be on board with > >> including > >>> min anyway, but I wanted to add my 2 cents on this topic should we > decide > >>> to revisit adding min or not. > >>> > >>> I do have a question regarding the calculation of staleness. > >>> Is there going to be a consideration for timestamp extractors? Users > >> could > >>> prefer to use a timestamp embedded in the payload, and it could skew > the > >>> measurements. > >>> I was wondering if we should specify in the KIP if setting the arrival > >> time > >>> is always going to come from the record timestamp, or is this an > >>> implementation detail we can cover in the PR? > >>> > >>> Thanks! > >>> Bill > >>> > >>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org> > >> wrote: > >>> > >>>> Thanks for the KIP Sophie. > >>>> > >>>> I think it's not useful to record the avg/mean; it sensitive to > >>>> outliers. We should rather track the median (50th percentile). > >>>> > >>>> Not sure if tracking min is useful, but I am also ok to track it. > >>>> > >>>> However, I find it odd to track 75th percentile. Standard measures > >> would > >>>> the 90th or 95th -- I guess we don't need both, so maybe picking 90th > >>>> might be more useful? > >>>> > >>>> About the name: "staleness" wound really odd, and if fact the metric > >>>> does capture "latency" so we should call it "latency". I understand > the > >>>> issue that we already have a latency metric. So maybe we could call it > >>>> `record-e2e-latency-*` ? > >>>> > >>>> While I agree that we should include out-or-order data (the KIP should > >>>> talk about `out-of-order` data, not `late` data; data is only `late` > if > >>>> it's out-of-order and if it's dropped), I don't really understand why > >>>> the new metric would help to configure grace period or retention time? > >>>> As you mention in the KIP, both are define as max difference of > >>>> `event-time - stream-time` and thus the new metric that takes > >>>> system-/wallclock-time into account does not seem to help at all. > >>>> > >>>> > >>>> Btw: there is a great talk about "How NOT to Measure Latency" by Gil > >>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> > >>>> On 5/14/20 7:17 PM, John Roesler wrote: > >>>>> Hi Sophie, > >>>>> > >>>>> It seems like there would still be plenty of use cases for recording > >>>>> this metric at all processors and not just stateful ones, but I'm > >> happy > >>>>> to suspend my arguments for now. Since you're proposing to keep > >>>>> them at the processor-node level, it will be seamless later to add > >>>>> in the stateless processors if we want. As a wise man once said, > >>>>> "Adding is always easier than removing." > >>>>> > >>>>> Regarding the time measurement, it's an implementation detail > >>>>> we don't need to consider in the KIP. Nevertheless, I'd greatly > >>>>> prefer to measure the system time again when recording the > >>>>> metric. I don't think we've seen any evidence that proves this > >>>>> would harm performance, and the amount of inaccuracy using > >>>>> the cached system time could incur is actually substantial. But, > >>>>> if you want to just "not mention this" in the KIP, we can defer to > >>>>> the actual PR discussion, at which time we're in a better position > >>>>> to use benchmarks, etc., to make the call. > >>>>> > >>>>> Along the lines of the measurement accuracy discussion, one > >>>>> minor thought I had is that maybe we should consider measuring > >>>>> the task staleness metric at the sink, rather than the source, so > >> that > >>>>> it includes the processing latency of the task itself, not just the > >>>> latency > >>>>> of everything up to, but not including, the task (which seems > >> confusing > >>>>> for users). I guess this could also be an implementation detail, > >>> though. > >>>>> > >>>>> Thanks for the update, > >>>>> -John > >>>>> > >>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote: > >>>>>> Hey all, > >>>>>> > >>>>>> After discussing with Bruno I'd like to propose a small amendment, > >>>>>> which is to record the processor-node-level metrics only for > >>> *stateful* > >>>>>> *operators*. They would still be considered a "processor-node-level" > >>>>>> metric and not a "state-store-level" metric as the staleness is > >> still > >>>>>> a property of the node rather than of the state itself. However, it > >>>> seems > >>>>>> that this information is primarily useful for stateful operators > >> that > >>>> might > >>>>>> be exposing state via IQ or otherwise dependent on the record time > >>>>>> unlike a stateless operator. > >>>>>> > >>>>>> It's worth calling out that recent performance improvements to the > >>>> metrics > >>>>>> framework mean that we no longer fetch the system time at the > >> operator > >>>>>> level, but only once per task. In other words the system time is not > >>>> updated > >>>>>> between each process as a record flows through the subtopology, so > >>>>>> debugging the processor-level latency via the stateleness will not > >> be > >>>>>> possible.Note that this doesn't mean the operator-level metrics are > >>> not > >>>>>> *useful* relative to the task-level metric. Upstream caching and/or > >>>>>> suppression > >>>>>> can still cause a record's staleness at some downstream stateful > >>>> operator > >>>>>> to deviate from the task-level staleness (recorded at the source > >>> node). > >>>>>> > >>>>>> Please let me know if you have any concerns about this change. The > >>>>>> KIP has been updated with the new proposal > >>>>>> > >>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <br...@confluent.io> > >>>> wrote: > >>>>>> > >>>>>>> Hi Sophie, > >>>>>>> > >>>>>>> Thank you for the KIP. > >>>>>>> > >>>>>>> The KIP looks good to me. > >>>>>>> > >>>>>>> 50th percentile: > >>>>>>> I think we do not need it now. If we need it, we can add it. Here > >> the > >>>>>>> old truism applies: Adding is always easier than removing. > >>>>>>> > >>>>>>> processor-node-level metrics: > >>>>>>> I think it is good to have the staleness metrics also on > >>>>>>> processor-node-level. If we do not want to record them on all > >>>>>>> processor nodes, you could restrict the recording to stateful > >>>>>>> processor-nodes, since those are the ones that would benefit most > >>> from > >>>>>>> the staleness metrics. > >>>>>>> > >>>>>>> Best, > >>>>>>> Bruno > >>>>>>> > >>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman < > >>>> sop...@confluent.io> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Yeah, the specific reason was just to align with the current > >>> metrics. > >>>>>>>> > >>>>>>>> Is it better to conform than to be right? History has a lot to say > >>> on > >>>>>>> that > >>>>>>>> matter > >>>>>>>> but I'm not sure how much of it applies to the fine details of > >>> metrics > >>>>>>>> naming :P > >>>>>>>> > >>>>>>>> More seriously, I figured if people are looking at this metric > >>> they're > >>>>>>>> likely to > >>>>>>>> be looking at all the others. Then naming this one "-mean" would > >>>> probably > >>>>>>>> lead some to conclude that the "-avg" suffix in the other metrics > >>> has > >>>> a > >>>>>>>> different meaning. > >>>>>>>> > >>>>>>>> As for the percentiles, I actually like p99 (and p75) better. I'll > >>>> swap > >>>>>>>> that out > >>>>>>>> > >>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <vvcep...@apache.org > >>> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks Sophie, > >>>>>>>>> > >>>>>>>>> I hope this isn't too nit-picky, but is there a reason to choose > >>>> "avg" > >>>>>>>>> instead > >>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be > >> oversensitive > >>>>>>> because > >>>>>>>>> of the mistake I just made earlier, but it strikes me that "avg" > >> is > >>>>>>>>> actually > >>>>>>>>> ambiguous, as it refers to a family of statistics, whereas "mean" > >>> is > >>>>>>>>> specific. > >>>>>>>>> I see other Kafka metrics with "avg", but none with "mean"; was > >>> that > >>>>>>> the > >>>>>>>>> reason? If so, I'm +1. > >>>>>>>>> > >>>>>>>>> Regarding the names of the percentile, I actually couldn't find > >>> _any_ > >>>>>>> other > >>>>>>>>> metrics that use percentile. Was there a reason to choose "99th" > >> as > >>>>>>> opposed > >>>>>>>>> to "p99" or any other scheme? This is not a criticism, I'm just > >>>>>>> primarily > >>>>>>>>> asking > >>>>>>>>> for consistency's sake. > >>>>>>>>> > >>>>>>>>> Thanks again, > >>>>>>>>> -John > >>>>>>>>> > >>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote: > >>>>>>>>>> Alright, I can get behind adding the min metric for the sake of > >>>>>>> pretty > >>>>>>>>>> graphs > >>>>>>>>>> (and trivial computation). > >>>>>>>>>> > >>>>>>>>>> I'm still on the fence regarding the mean (or 50th percentile) > >>> but I > >>>>>>> can > >>>>>>>>> see > >>>>>>>>>> how users might expect it and find it a bit disorienting not to > >>>>>>> have. So > >>>>>>>>> the > >>>>>>>>>> updated proposed metrics are > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> - record-staleness-max [ms] > >>>>>>>>>> - record-staleness-99th [ms] *(99th percentile)* > >>>>>>>>>> - record-staleness-75th [ms] *(75th percentile)* > >>>>>>>>>> - record-staleness-avg [ms] *(mean)* > >>>>>>>>>> - record-staleness-min [ms] > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler < > >> vvcep...@apache.org > >>>> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I > >> guess > >>>>>>> the > >>>>>>>>> mean > >>>>>>>>>>> seems more interesting to me than the median, but neither are > >> as > >>>>>>>>>>> interesting as the higher percentiles (99th and max). > >>>>>>>>>>> > >>>>>>>>>>> Min isn’t really important for any SLAs, but it does round out > >>> the > >>>>>>>>> mental > >>>>>>>>>>> picture of the distribution. I’ve always graphed min along with > >>> the > >>>>>>>>> other > >>>>>>>>>>> metrics to help me understand how fast the system can be, which > >>>>>>> helps > >>>>>>>>> in > >>>>>>>>>>> optimization decisions. It’s also a relatively inexpensive > >> metric > >>>>>>> to > >>>>>>>>>>> compute, so it might be nice to just throw it in. > >>>>>>>>>>> > >>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote: > >>>>>>>>>>>> G1: > >>>>>>>>>>>> I was considering it as the "end-to-end latency *up* to the > >>>>>>> specific > >>>>>>>>>>> task" > >>>>>>>>>>>> but > >>>>>>>>>>>> I'm happy with "record-staleness" if that drives the point > >> home > >>>>>>>>> better. > >>>>>>>>>>> So > >>>>>>>>>>>> it's the > >>>>>>>>>>>> "staleness of the record when it is received by that task" -- > >>>>>>> will > >>>>>>>>> update > >>>>>>>>>>>> the KIP > >>>>>>>>>>>> > >>>>>>>>>>>> B1/J: > >>>>>>>>>>>> I'm struggling to imagine a case where the min would actually > >> be > >>>>>>>>> useful, > >>>>>>>>>>>> rather than > >>>>>>>>>>>> just intellectually interesting. I don't feel strongly that we > >>>>>>>>> shouldn't > >>>>>>>>>>>> add it, but that's > >>>>>>>>>>>> why I didn't include it from the start. Can you enlighten me > >>>>>>> with an > >>>>>>>>>>>> example? > >>>>>>>>>>>> > >>>>>>>>>>>> I was also vaguely concerned about the overhead of adding > >>>>>>> multiple > >>>>>>>>>>>> percentile > >>>>>>>>>>>> metrics. Do we have any data to indicate what kind of > >>> performance > >>>>>>>>> hit we > >>>>>>>>>>>> take on > >>>>>>>>>>>> metrics computation? > >>>>>>>>>>>> > >>>>>>>>>>>> Also, not to be too pedantic but the 50th percentile would be > >>> the > >>>>>>>>> median > >>>>>>>>>>>> not the > >>>>>>>>>>>> mean. Would you propose to add the mean *and* the 50th > >>>>>>> percentile, or > >>>>>>>>>>> just > >>>>>>>>>>>> one > >>>>>>>>>>>> of the two? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks all! > >>>>>>>>>>>> Sophie > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler < > >>>>>>> vvcep...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Just some comments on the discussion so far: > >>>>>>>>>>>>> > >>>>>>>>>>>>> B2/G1: > >>>>>>>>>>>>> In principle, it shouldn't matter whether we report "spans" > >> or > >>>>>>>>>>>>> "end-to-end" latency. But in practice, some of the spans are > >>>>>>> pretty > >>>>>>>>>>>>> difficult to really measure (like time spent waiting in the > >>>>>>>>> topics, or > >>>>>>>>>>>>> time from the event happening to the ETL producer choosing to > >>>>>>> send > >>>>>>>>> it, > >>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc. > >>>>>>>>>>>>> > >>>>>>>>>>>>> In other words, it's practically easier to compute spans by > >>>>>>>>> subtracting > >>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies by adding > >>>>>>> spans. > >>>>>>>>> You > >>>>>>>>>>>>> can even just consider that the span computation from e2e > >>>>>>> always > >>>>>>>>> just > >>>>>>>>>>>>> involves subtracting two numbers, whereas computing e2e > >> latency > >>>>>>>>> from > >>>>>>>>>>>>> spans involves adding _all_ the spans leading up to the end > >> you > >>>>>>>>> care > >>>>>>>>>>> about. > >>>>>>>>>>>>> > >>>>>>>>>>>>> It seems like people really prefer to have spans when they > >> are > >>>>>>>>>>> debugging > >>>>>>>>>>>>> latency problems, whereas e2e latency is a more general > >>>>>>> measurement > >>>>>>>>>>>>> that basically every person/application cares about and > >> should > >>>>>>> be > >>>>>>>>>>>>> monitoring. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Altogether, it really seem to provide more value to more > >>>>>>> people if > >>>>>>>>> we > >>>>>>>>>>>>> report > >>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I > >> think > >>>>>>> I > >>>>>>>>> have > >>>>>>>>>>> no > >>>>>>>>>>>>> preference, I'd defer to other peoples' intuition. > >>>>>>>>>>>>> > >>>>>>>>>>>>> G2: > >>>>>>>>>>>>> I think the processor-node metric is nice, since the inside > >> of > >>>>>>> a > >>>>>>>>> task > >>>>>>>>>>> can > >>>>>>>>>>>>> introduce a significant amount of latency in some cases. > >> Plus, > >>>>>>>>> it's a > >>>>>>>>>>> more > >>>>>>>>>>>>> direct measurement, if you really wanted to know (for the > >>>>>>> purposes > >>>>>>>>> of > >>>>>>>>>>> IQ > >>>>>>>>>>>>> or something) how long it takes source events to "show up" at > >>>>>>> the > >>>>>>>>>>> store. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think actually recording it at every processor could be > >>>>>>>>> expensive, > >>>>>>>>>>> but we > >>>>>>>>>>>>> already record a bunch of metrics at the node level. > >>>>>>>>>>>>> > >>>>>>>>>>>>> B1: > >>>>>>>>>>>>> I think 50% could be reasonable to record also. Even if it's > >> a > >>>>>>> poor > >>>>>>>>>>> metric > >>>>>>>>>>>>> for operational purposes, a lot of people might expect to see > >>>>>>>>> "mean". > >>>>>>>>>>>>> Actually, > >>>>>>>>>>>>> I was surprised not to see "min". Is there a reason to leave > >> it > >>>>>>>>> off? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I might suggest: > >>>>>>>>>>>>> min, mean (50th), 75th, 99th, max > >>>>>>>>>>>>> > >>>>>>>>>>>>> B3: > >>>>>>>>>>>>> I agree we should include late records (though not the ones > >> we > >>>>>>>>> drop). > >>>>>>>>>>>>> It may be spiky, but only when there are legitimately some > >>>>>>> records > >>>>>>>>>>> with a > >>>>>>>>>>>>> high end-to-end latency, which is the whole point of these > >>>>>>> metrics. > >>>>>>>>>>>>> > >>>>>>>>>>>>> That's it! I don't think I have any other feedback, other > >> than > >>>>>>> a > >>>>>>>>>>> request to > >>>>>>>>>>>>> also report "min". > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> -John > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote: > >>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1) The end-to-end latency includes both the processing > >>>>>>> latency > >>>>>>>>> of the > >>>>>>>>>>>>> task > >>>>>>>>>>>>>> and the latency spent sitting in intermediate topics. I have > >>>>>>> a > >>>>>>>>>>> similar > >>>>>>>>>>>>>> feeling as Boyang mentioned above that the latency metric of > >>>>>>> a > >>>>>>>>> task A > >>>>>>>>>>>>>> actually measures the latency of the sub-topology up-to but > >>>>>>> not > >>>>>>>>>>> including > >>>>>>>>>>>>>> the processing of A, which is a bit weird. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Maybe the my feeling comes from the name "latency" itself, > >>>>>>> since > >>>>>>>>>>> today we > >>>>>>>>>>>>>> already have several "latency" metrics already which are > >>>>>>>>> measuring > >>>>>>>>>>>>> elapsed > >>>>>>>>>>>>>> system-time for processing a record / etc, while here we are > >>>>>>>>>>> comparing > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> system wallclock time with the record timestamp. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Maybe we can consider renaming it as "record-staleness" > >>>>>>> (note we > >>>>>>>>>>> already > >>>>>>>>>>>>>> have a "record-lateness" metric), in which case recording at > >>>>>>> the > >>>>>>>>>>>>>> system-time before we start processing the record sounds > >> more > >>>>>>>>>>> natural. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the > >>>>>>> processor-node-level > >>>>>>>>> DEBUG > >>>>>>>>>>>>>> metric is worth to add, given that we already have a > >>>>>>> task-level > >>>>>>>>>>>>> processing > >>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e latency is > >>>>>>>>> similar > >>>>>>>>>>> to > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> task-level e2e latency + task-level processing latency. > >>>>>>>>> Personally I > >>>>>>>>>>>>> think > >>>>>>>>>>>>>> having a task-level record-staleness metric is sufficient. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman < > >>>>>>>>>>>>> sop...@confluent.io> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge for > >>>>>>> this > >>>>>>>>>>>>> specific > >>>>>>>>>>>>>>> metric, as > >>>>>>>>>>>>>>> it's presumably most useful at putting an *upper *bound on > >>>>>>> the > >>>>>>>>>>> latency > >>>>>>>>>>>>> you > >>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles that would > >>>>>>>>> hopefully > >>>>>>>>>>>>> give a > >>>>>>>>>>>>>>> good > >>>>>>>>>>>>>>> sense of what *most* records will experience, and what > >>>>>>> *close > >>>>>>>>> to > >>>>>>>>>>> all* > >>>>>>>>>>>>>>> records > >>>>>>>>>>>>>>> will. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> However I'm not married to these specific numbers and > >>>>>>> could be > >>>>>>>>>>>>> convinced. > >>>>>>>>>>>>>>> Would be especially interested in hearing from users on > >>>>>>> this. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop latency" in > >>>>>>>>> this KIP > >>>>>>>>>>>>> since > >>>>>>>>>>>>>>> users > >>>>>>>>>>>>>>> can always compute it themselves by subtracting the > >>>>>>> previous > >>>>>>>>> node's > >>>>>>>>>>>>>>> end-to-end latency. I guess we could do it either way since > >>>>>>>>> you can > >>>>>>>>>>>>> always > >>>>>>>>>>>>>>> compute one from the other, but I think the end-to-end > >>>>>>> latency > >>>>>>>>>>> feels > >>>>>>>>>>>>> more > >>>>>>>>>>>>>>> valuable as it's main motivation is not to debug > >>>>>>> bottlenecks > >>>>>>>>> in the > >>>>>>>>>>>>>>> topology but > >>>>>>>>>>>>>>> to give users a sense of how long it takes arecord to be > >>>>>>>>> reflected > >>>>>>>>>>> in > >>>>>>>>>>>>>>> certain parts > >>>>>>>>>>>>>>> of the topology. For example this might be useful for users > >>>>>>>>> who are > >>>>>>>>>>>>>>> wondering > >>>>>>>>>>>>>>> roughly when a record that was just produced will be > >>>>>>> included > >>>>>>>>> in > >>>>>>>>>>> their > >>>>>>>>>>>>> IQ > >>>>>>>>>>>>>>> results. > >>>>>>>>>>>>>>> Debugging is just a nice side effect -- but maybe I didn't > >>>>>>> make > >>>>>>>>>>> that > >>>>>>>>>>>>> clear > >>>>>>>>>>>>>>> enough > >>>>>>>>>>>>>>> in the KIP's motivation. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 3. Good question, I should address this in the KIP. The > >>>>>>> short > >>>>>>>>>>> answer is > >>>>>>>>>>>>>>> "yes", > >>>>>>>>>>>>>>> we will include late records. I added a paragraph to the > >>>>>>> end > >>>>>>>>> of the > >>>>>>>>>>>>>>> Proposed > >>>>>>>>>>>>>>> Changes section explaining the reasoning here, please let > >>>>>>> me > >>>>>>>>> know > >>>>>>>>>>> if > >>>>>>>>>>>>> you > >>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>> any concerns. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 4. Assuming you're referring to the existing metric > >>>>>>>>>>> "process-latency", > >>>>>>>>>>>>> that > >>>>>>>>>>>>>>> metric > >>>>>>>>>>>>>>> reflects the time for the literal Node#process method to > >>>>>>> run > >>>>>>>>>>> whereas > >>>>>>>>>>>>> this > >>>>>>>>>>>>>>> metric > >>>>>>>>>>>>>>> would always be measured relative to the event timestamp. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> That said, the naming collision there is pretty confusing > >>>>>>> so > >>>>>>>>> I've > >>>>>>>>>>>>> renamed > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I feel > >>>>>>> better > >>>>>>>>>>>>> reflects > >>>>>>>>>>>>>>> the nature > >>>>>>>>>>>>>>> of the metric anyway. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks for the feedback! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen < > >>>>>>>>>>>>> reluctanthero...@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is > >>>>>>>>> important > >>>>>>>>>>> for > >>>>>>>>>>>>>>>> understanding the bottleneck of the application. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> A couple of questions and ideas: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 and > >>>>>>> max > >>>>>>>>>>>>> percentiles? > >>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as > >>>>>>> well > >>>>>>>>> in > >>>>>>>>>>>>> production > >>>>>>>>>>>>>>>> systems. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 2. The current latency being computed is cumulative, I.E > >>>>>>> if a > >>>>>>>>>>> record > >>>>>>>>>>>>> goes > >>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) = > >>>>>>> T(B->C) + > >>>>>>>>>>> T(A->B) + > >>>>>>>>>>>>>>> T(A) > >>>>>>>>>>>>>>>> and so on, where P() represents the captured latency, > >>>>>>> and T() > >>>>>>>>>>>>> represents > >>>>>>>>>>>>>>>> the time for transiting the records between two nodes, > >>>>>>>>> including > >>>>>>>>>>>>>>> processing > >>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and > >>>>>>>>> T(A->B) > >>>>>>>>>>> are > >>>>>>>>>>>>> more > >>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if > >>>>>>> there > >>>>>>>>> is a > >>>>>>>>>>>>> spike in > >>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same > >>>>>>> time. > >>>>>>>>> In the > >>>>>>>>>>>>> same > >>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only when the > >>>>>>> record > >>>>>>>>> exits > >>>>>>>>>>>>> from the > >>>>>>>>>>>>>>>> sink as this marks the whole time this record spent > >>>>>>> inside > >>>>>>>>> the > >>>>>>>>>>>>> funnel. Do > >>>>>>>>>>>>>>>> you think we could have separate treatment for sink > >>>>>>> nodes and > >>>>>>>>>>> other > >>>>>>>>>>>>>>>> nodes, so that other nodes only count the time receiving > >>>>>>> the > >>>>>>>>>>> record > >>>>>>>>>>>>> from > >>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here, just want to > >>>>>>>>> discuss > >>>>>>>>>>>>> this > >>>>>>>>>>>>>>>> alternative to see if it is reasonable. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival records as > >>>>>>> well, > >>>>>>>>> they > >>>>>>>>>>>>> would > >>>>>>>>>>>>>>>> create some really spiky graphs when the out-of-order > >>>>>>>>> records are > >>>>>>>>>>>>>>>> interleaving with on time records. Should we also supply > >>>>>>> a > >>>>>>>>> smooth > >>>>>>>>>>>>> version > >>>>>>>>>>>>>>>> of the latency metrics, or user should just take care of > >>>>>>> it > >>>>>>>>> by > >>>>>>>>>>>>> themself? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed its > >>>>>>>>> relation > >>>>>>>>>>>>> with our > >>>>>>>>>>>>>>>> existing processing latency metrics, could you add some > >>>>>>>>> context > >>>>>>>>>>> on > >>>>>>>>>>>>>>>> comparison and a simple `when to use which` tutorial for > >>>>>>> the > >>>>>>>>>>> best? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Boyang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman < > >>>>>>>>>>>>> sop...@confluent.io > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hey all, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims > >>>>>>> to > >>>>>>>>> add > >>>>>>>>>>>>> end-to-end > >>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a look: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>> Sophie > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > > > >