Hi Justine, Thanks for the reply. I added a Rejected Alternatives section in the KIP.
Thanks, Tony On Fri, May 22, 2026 at 5:25 PM Justine Olshan via dev <[email protected]> wrote: > Hey Tony, > Sorry for the delay on the response. Thanks for sharing the metrics we have > in Kafka. > > I think given the approaches we chose, it could be useful to include a > section in the KIP for rejected alternatives with a summary of things we > considered, but chose not to implement. > That can be useful to folks who may not follow this discussion thread, but > may have questions about certain design choices. Things like why gauge vs. > meter and maybe some of the discussion above about whether to reset the > values given the state. > > Thanks, > Justine > > > > On Thu, May 14, 2026 at 10:55 AM Tony Tang via dev <[email protected]> > wrote: > > > Hi Kevin, > > > > I think we are aligned that the metric should > > > increment in value when its corresponding timer goes from unexpired to > > > expired during a poll of the raft client. It should not increment in > > value > > > if the timer goes from expired to expired during a poll of the raft > > client. > > > > > > Yeah I agree with that. I'll add this point to the KIP > > > > Best, > > Tony > > > > On Thu, May 14, 2026 at 12:09 PM Kevin Wu <[email protected]> > wrote: > > > > > Hi Tony, > > > > > > Yeah we should always report these metrics. When/how to increment them > is > > > an implementation detail, but I think we are aligned that the metric > > should > > > increment in value when its corresponding timer goes from unexpired to > > > expired during a poll of the raft client. It should not increment in > > value > > > if the timer goes from expired to expired during a poll of the raft > > client. > > > What do you think? > > > > > > Best, > > > Kevin Wu > > > > > > On Wed, May 13, 2026 at 7:24 PM Tony Tang via dev < > [email protected]> > > > wrote: > > > > > > > Hi Kevin, > > > > > > > > Re KW3: That makes sense to me. In particular, if we only register > the > > > > election timeout metric during Unattached/Prospective/Candidate, > those > > > are > > > > all short states. After a timeout the node changes status, and the > > metric > > > > gets unregistered. Since nodes spend most of their time in Follower > or > > > > Leader, operators may never see this metric at all. So yeah, let's > just > > > > always report these metrics. > > > > > > > > Re KW8: > > > > > > > > Implementing this like case 1 could diverge from the meaning of the > > > metric > > > > > in some cases. For example, expiring the fetch timeout for a voter > > > always > > > > > causes a transition to Prospective, so this metric is not updated > on > > > the > > > > > next poll(). It also stops getting reported. > > > > > > > > Since we decide not registering/unregistering metrics based on state, > > the > > > > above concerns disappear, right? > > > > > > > > observers do not transition to another state > > > > > when its fetch timeout expires, so every subsequent poll would > > increase > > > > > this metric's value, which is technically wrong based on what we > > > defined > > > > in > > > > > the KIP. > > > > > > > > Okay, I understand the observer concern. Let's assume this KIP will > be > > > > implemented after KAFKA-20514, so the observer fetch timeout behavior > > > will > > > > already be fixed by then. > > > > > > > > In summary: > > > > 1. we always report the metrics. > > > > 2. we only increment the counter in the poll methods when isExpired() > > is > > > > called, as we originally planned. Does that work for you? > > > > > > > > Thanks, > > > > Tony > > > > > > > > On Wed, May 13, 2026 at 12:10 PM Kevin Wu <[email protected]> > > > wrote: > > > > > > > > > Hi Tony, > > > > > > > > > > KW3: Sorry for going back and forth on this, but one issue I see > with > > > not > > > > > reporting these metrics when the node's current EpochState does not > > > > contain > > > > > the timer is that downstream observability platforms may not be > able > > to > > > > > capture these values very well. For example, the fetch timeout can > > only > > > > > expire once on a FollowerAsVoter before it transitions to > > Prospective, > > > > and > > > > > that value would not be reported/visible to operators. Only upon > > > > returning > > > > > to FollowerState would the operator see the fetch timeout metric > > again, > > > > > which cannot increment without first not reporting any value. That > is > > > not > > > > > intuitive metric behavior to me, as the operator would see the > value > > > "too > > > > > late". > > > > > > > > > > In summary, I think it is fine to always report these metrics, and > if > > > it > > > > > turns out that is wrong or confusing, that's an implementation > detail > > > we > > > > > can fix later. I think I was trying to draw too many parallels with > > the > > > > > KIP-853 leader metrics I implemented previously. Leader and > Follower > > > are > > > > > "long-lived" states so I think it makes sense to report metrics > which > > > > only > > > > > change in these states, during these states. Timer expirations > > > > specifically > > > > > cause transitions to and between more "intermittent" states (e.g. > > > > > Prospective, Candidate, etc.), and therefore occur at the "end" of > an > > > > > EpochState, so registering and unregistering them does not seem > like > > a > > > > good > > > > > idea. > > > > > > > > > > KW8: Yeah, basically. Implementing this like case 1 could diverge > > from > > > > the > > > > > meaning of the metric in some cases. For example, expiring the > fetch > > > > > timeout for a voter always causes a transition to Prospective, so > > this > > > > > metric is not updated on the next poll(). It also stops getting > > > reported. > > > > > However, in the current code, observers do not transition to > another > > > > state > > > > > when its fetch timeout expires, so every subsequent poll would > > increase > > > > > this metric's value, which is technically wrong based on what we > > > defined > > > > in > > > > > the KIP. > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/KAFKA-20514__;!!Ayb5sqE7!qUcUv6MZLg9t1tNcXiCJCDe1vw-YajXBZeS9xD-Un3niEylSMbS3gHZyccO2mgQha4feqEvYJSE5sbQTnAhaooA$ > > > > > fixes the issue > > > > > for the fetch timer specifically. > > > > > > > > > > This comment specifically is just something I wanted to make you > > aware > > > of > > > > > during your implementation, since it was not super obvious to me > that > > > > this > > > > > interaction with poll(), EpochState transitions, and our proposed > > > metrics > > > > > occurred until a deeper look at the code. > > > > > > > > > > Best, > > > > > Kevin Wu > > > > > > > > > > On Wed, May 13, 2026 at 8:51 AM Tony Tang via dev < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > Thanks for the feedback. > > > > > > Re KW 5,6,7: I agree with all three. I'll update the KIP > > > > > > Re KW 8: > > > > > > Case 1: The the poll methods in KafkaRaftClient actively calls > > > > > isExpired() > > > > > > on the timer, and when it finds the timer has expired, it > > increments > > > > the > > > > > > counter. The counting logic lives in the caller > > > > > > Case 2: The timer itself is aware of its expiration and > > automatically > > > > > > increments the counter when it transitions from not expired to > > > expired. > > > > > The > > > > > > counting logic lives inside the timer. > > > > > > Is that the cases you're talking about? > > > > > > > > > > > > Best, > > > > > > Tony > > > > > > > > > > > > > > > > > > On Tue, May 12, 2026 at 5:42 PM Kevin Wu <[email protected] > > > > > > wrote: > > > > > > > > > > > > > Hi Tony, > > > > > > > > > > > > > > Thanks for the updates to the KIP. I have a few minor comments: > > > > > > > > > > > > > > KW5: nit, but can we say “this metric is registered only…” > > instead > > > of > > > > > > > “Registered only…” in the description column? > > > > > > > > > > > > > > KW6: What do you think about renaming the metric to > > > > > > “timeout-expirations”? > > > > > > > This would be slightly less verbose than what we have now. > > > > > > > > > > > > > > KW7: Instead of having separate columns for the metric tag and > > > name, > > > > > can > > > > > > we > > > > > > > have one column with the entire MBean name? For example: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > “kafka.server:type=node-metrics,name=maximum-supported-level,feature-name=X”. > > > > > > > > > > > > > > KW8: Something to consider in determining when these metric > > values > > > > > should > > > > > > > be incremented: should the value be incremented when the raft > > > client > > > > > > reads > > > > > > > the timer and finds it has expired? Or, should the value be > > > > incremented > > > > > > > only when the timer goes from not expired to expired? The first > > is > > > > > simple > > > > > > > but means the metric value is slightly different than what > we’ve > > > > > > described > > > > > > > in the KIP. I feel like implementing the second may be pretty > > > > complex… > > > > > > > > > > > > > > Best, > > > > > > > Kevin Wu > > > > > > > > > > > > > > On Fri, May 8, 2026 at 10:06 AM Tony Tang via dev < > > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Justine, > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > I looked through the Kafka codebase and found a couple of > > > existing > > > > > > > timeout > > > > > > > > metrics: > > > > > > > > > > > > > > > > 1.worker-poll-timeout-count in Kafka Connect, which is a > > > cumulative > > > > > > count > > > > > > > > of poll timeouts, implemented as a Gauge using an AtomicLong > > > > counter. > > > > > > > This > > > > > > > > is the closest to what we're proposing. > > > > > > > > 2. AcquisitionLockTimeoutPerSec in Share Groups, which is a > > Meter > > > > > that > > > > > > > > tracks the rate of acquisition lock timeouts. > > > > > > > > > > > > > > > > I think our KIP follows a similar approach to > > > > > worker-poll-timeout-count > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Tony > > > > > > > > > > > > > > > > On Thu, May 7, 2026 at 5:21 PM Justine Olshan via dev < > > > > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hey Tony, > > > > > > > > > > > > > > > > > > Thanks for the KIP. Overall, I think the idea makes sense > and > > > it > > > > > > seems > > > > > > > > like > > > > > > > > > you and Kevin are getting closer to agreement on the exact > > > > > definition > > > > > > > of > > > > > > > > > the metrics. > > > > > > > > > I was curious, are there any other timeout metrics you can > > find > > > > in > > > > > > > Kafka > > > > > > > > > and how are those defined? We don't necessarily need to do > > the > > > > > same, > > > > > > > but > > > > > > > > > was curious if there was any precedent for this type of > > metric. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Justine > > > > > > > > > > > > > > > > > > On Mon, May 4, 2026 at 9:25 AM Tony Tang via dev < > > > > > > [email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > > > > > KW3: Ok, I agree that metric values at time X should be > an > > > > > accurate > > > > > > > > > > snapshot of the node at time X, and that collecting > > > historical > > > > > > values > > > > > > > > is > > > > > > > > > > not the responsibility of kafka. The approach you > described > > > > makes > > > > > > > sense > > > > > > > > > to > > > > > > > > > > me: we keep the internal counter across state > transitions, > > > but > > > > > only > > > > > > > > > > register/unregister the metric based on whether the > > > underlying > > > > > > timer > > > > > > > > > > exists. I'm on board with that approach > > > > > > > > > > > > > > > > > > > > KW4: That's a great suggestion. However, I think it's out > > of > > > > > scope > > > > > > > for > > > > > > > > > this > > > > > > > > > > KIP. I'd prefer to keep this KIP focused on timeout > > > expiration > > > > > > > metrics > > > > > > > > > and > > > > > > > > > > maybe we can discuss the state transition counting in the > > > > future. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Tony > > > > > > > > > > > > > > > > > > > > On Fri, May 1, 2026 at 5:12 PM Kevin Wu < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Tony, > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > > > > > > > KW3: I don't think "count" metrics like the ones being > > > > > discussed > > > > > > in > > > > > > > > > this > > > > > > > > > > > KIP should report values when the objects they are > > > associated > > > > > > with > > > > > > > do > > > > > > > > > not > > > > > > > > > > > exist. This would mean metric values at time X are not > an > > > > > > accurate > > > > > > > > > > > "snapshot" of the Kafka node at time X. In my opinion, > > > > > collecting > > > > > > > the > > > > > > > > > > > historic values for a metric, visualizing them through > > > > > > dashboards, > > > > > > > > and > > > > > > > > > > > monitoring them to alert operators are the > > responsibilities > > > > of > > > > > a > > > > > > > > > > downstream > > > > > > > > > > > observability software, not Kafka. Kafka does have the > > > > > capability > > > > > > > to > > > > > > > > > > create > > > > > > > > > > > "derivative" metrics (i.e. metrics whose values are > based > > > off > > > > > of > > > > > > > > > sampling > > > > > > > > > > > something else over time) via Sensors, but I don't > think > > > that > > > > > > fits > > > > > > > > our > > > > > > > > > > use > > > > > > > > > > > case as previously discussed. > > > > > > > > > > > > > > > > > > > > > > Another way to think about it is that adding or > removing > > a > > > > > metric > > > > > > > so > > > > > > > > > > Kafka > > > > > > > > > > > starts or stops reporting a value to an observability > > > service > > > > > > > > actually > > > > > > > > > > > tells the user more information about Kafka compared to > > > > > > > > unconditionally > > > > > > > > > > > reporting said metric value. Additionally, just because > > you > > > > > > remove > > > > > > > a > > > > > > > > > > metric > > > > > > > > > > > does not mean you need to remove the corresponding > > counter > > > > > within > > > > > > > the > > > > > > > > > > > KafkaRaftMetrics object. For example, a node fetch > > request > > > > > times > > > > > > > out > > > > > > > > 5 > > > > > > > > > > > times as a follower, so the node reports 5 for the > > metric. > > > > > Next, > > > > > > > the > > > > > > > > > node > > > > > > > > > > > becomes the leader, so it stops reporting the metric > > (i.e. > > > > the > > > > > > > metric > > > > > > > > > has > > > > > > > > > > > no value). Then it becomes a follower, and starts > > > reporting 5 > > > > > > > again. > > > > > > > > > When > > > > > > > > > > > the next fetch timeout happens, the node reports 6 for > > the > > > > > > metric. > > > > > > > > What > > > > > > > > > > do > > > > > > > > > > > you think about this behavior? > > > > > > > > > > > > > > > > > > > > > > KW4: If the desire is for a metric that always reports > a > > > > value, > > > > > > > what > > > > > > > > do > > > > > > > > > > you > > > > > > > > > > > think about a metric that counts the number of > > `EpochState` > > > > > > > > > transitions? > > > > > > > > > > I > > > > > > > > > > > think this value makes sense to report this value for > the > > > > > > lifetime > > > > > > > > of a > > > > > > > > > > > process, and generally, frequent state transitions are > an > > > > > > > indication > > > > > > > > > > > something is wrong with the cluster. This would be an > > > > > additional > > > > > > > > metric > > > > > > > > > > to > > > > > > > > > > > the ones we discussed above. > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > > > On Fri, May 1, 2026 at 12:59 PM Tony Tang via dev < > > > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. Very insightful points. > > > > > > > > > > > > > > > > > > > > > > > > KW1: Yes, using a single tagged metric makes sense. > > it's > > > > > > cleaner > > > > > > > > and > > > > > > > > > > more > > > > > > > > > > > > extensible. I'll adopt this approach. > > > > > > > > > > > > > > > > > > > > > > > > KW2: Yes, we don't need to use `CumulativeCount`. > > Already > > > > > > updated > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > KIP > > > > > > > > > > > > > > > > > > > > > > > > KW3: I understand each timer is only meaningful in > > > certain > > > > > > > states, > > > > > > > > > but > > > > > > > > > > > the > > > > > > > > > > > > metric value is still useful for operational > monitoring > > > > > > > regardless > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > > > current state. It tells you how many times a timeout > > has > > > > > > expired > > > > > > > > over > > > > > > > > > > the > > > > > > > > > > > > lifetime of the node. Hiding or clearing the metric > > when > > > > the > > > > > > node > > > > > > > > > isn't > > > > > > > > > > > in > > > > > > > > > > > > the relevant state could actually make it harder for > > > users > > > > to > > > > > > > > > diagnose > > > > > > > > > > > > historical issues, since they'd need to catch the > > metric > > > > > while > > > > > > > the > > > > > > > > > node > > > > > > > > > > > > happens to be in the right state. For example, if a > > > > follower > > > > > > had > > > > > > > > > > repeated > > > > > > > > > > > > fetch timeout expirations and then transitions to a > > > > > > > > candidate/leader, > > > > > > > > > > the > > > > > > > > > > > > metrics would still be valuable for diagnosing why > the > > > > leader > > > > > > > > > election > > > > > > > > > > > > happened in the first place, right? If we cleared the > > > > metric > > > > > on > > > > > > > > state > > > > > > > > > > > > transition, that information would be lost. The > > question > > > > is : > > > > > > Do > > > > > > > we > > > > > > > > > > only > > > > > > > > > > > > want the metric to reflect only the latest state, or > > the > > > > > > overall > > > > > > > > > > timeout > > > > > > > > > > > > behavior over the node's lifetime? I lean toward the > > > > latter, > > > > > as > > > > > > > it > > > > > > > > > > > provides > > > > > > > > > > > > more useful information for monitoring network > issues. > > To > > > > > avoid > > > > > > > > > > > confusion, > > > > > > > > > > > > maybe we can use the metric name > > lifetime-timeout-count + > > > > tag > > > > > > > > > > > > timer-name=fetch/election? What do you think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 30, 2026 at 3:03 PM Kevin Wu < > > > > > > [email protected] > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Tony, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. I agree that having metrics for > > > > > timeouts > > > > > > in > > > > > > > > > KRaft > > > > > > > > > > > > would > > > > > > > > > > > > > be a nice addition. I have a few high level > comments > > > > about > > > > > > the > > > > > > > > KIP: > > > > > > > > > > > > > > > > > > > > > > > > > > KW1: Did you consider making a tagged metric like > > > > > > > > > > `number-of-timeouts` > > > > > > > > > > > > > instead of individual metrics? You could tag by the > > > timer > > > > > > name > > > > > > > > > (e.g. > > > > > > > > > > > > fetch, > > > > > > > > > > > > > election, update-voter, check-quorum, and > > > > > begin-quorum-epoch > > > > > > > > etc.) > > > > > > > > > > > since > > > > > > > > > > > > > KRaft supports several kinds of timers, and may add > > > more > > > > in > > > > > > the > > > > > > > > > > future. > > > > > > > > > > > > You > > > > > > > > > > > > > can look at `NodeMetrics.java` and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1180*3A*Add*generic*feature*level*metrics__;JSsrKysr!!Ayb5sqE7!qPjsZ_186iR3QjEak9hmexMYOhwzDGzvcLYwnVUujYlAy2wAAQchfvSKMr9oG7Mygg608Vz6zFCv5QDQFYUcvow$ > > > > > > > > > > > > > for an example of tagged metrics using Kafka's new > > > > metrics > > > > > > > > > library. I > > > > > > > > > > > > think > > > > > > > > > > > > > there is an argument we should add timeout metrics > > for > > > > some > > > > > > of > > > > > > > > > these > > > > > > > > > > > > other > > > > > > > > > > > > > KRaft timers I mentioned, since reporting them > could > > > also > > > > > > help > > > > > > > > > > > operators > > > > > > > > > > > > > diagnose network partitions or possible software > > bugs. > > > > > > > > > > > > > > > > > > > > > > > > > > KW2: I see the "Type" for each metric is > > > > > `CumulativeCount`. I > > > > > > > > think > > > > > > > > > > > this > > > > > > > > > > > > > might be overkill, and that we could just use > Integer > > > for > > > > > the > > > > > > > > data > > > > > > > > > > > type, > > > > > > > > > > > > > and expose an increment method for each metric. In > > > > general, > > > > > > > > sensors > > > > > > > > > > are > > > > > > > > > > > > > used for when multiple metrics are associated with > a > > > > > specific > > > > > > > > > concept > > > > > > > > > > > > (e.g. > > > > > > > > > > > > > `commit-latency-avg` and `commit-latency-max` are > two > > > > > > different > > > > > > > > > > metrics > > > > > > > > > > > > > associated with the same concept of "commit > > latency"). > > > It > > > > > is > > > > > > > hard > > > > > > > > > for > > > > > > > > > > > me > > > > > > > > > > > > to > > > > > > > > > > > > > imagine that the number of timeouts occurring would > > > have > > > > > more > > > > > > > > than > > > > > > > > > > one > > > > > > > > > > > > > metric associated with it. > > > > > > > > > > > > > > > > > > > > > > > > > > KW3: Each of these timers is associated with an > > > > EpochState > > > > > > > (e.g. > > > > > > > > > the > > > > > > > > > > > > fetch > > > > > > > > > > > > > timer with FollowerState, check quorum timer with > > > > > > LeaderState, > > > > > > > > > etc.). > > > > > > > > > > > > What > > > > > > > > > > > > > should the value of these metrics be when a node > > > > > transitions > > > > > > > > > between > > > > > > > > > > > > > EpochStates? Should we stop reporting the metrics > > > > > associated > > > > > > > with > > > > > > > > > the > > > > > > > > > > > old > > > > > > > > > > > > > EpochState, and start reporting the metrics > > associated > > > > with > > > > > > the > > > > > > > > new > > > > > > > > > > > > > EpochState? I personally think it might be > confusing > > if > > > > > these > > > > > > > > > metrics > > > > > > > > > > > > > report values even if the underlying timer does not > > > exist > > > > > on > > > > > > > the > > > > > > > > > > node. > > > > > > > > > > > > For > > > > > > > > > > > > > example, the fetch timeout metric reporting a value > > > when > > > > > the > > > > > > > > local > > > > > > > > > > node > > > > > > > > > > > > is > > > > > > > > > > > > > the KRaft leader seems odd to me. When we added > > metrics > > > > for > > > > > > > > KIP-853 > > > > > > > > > > > > > associated with the leader (e.g. > > > > > `uncommitted-voter-change`), > > > > > > > we > > > > > > > > > > > decided > > > > > > > > > > > > to > > > > > > > > > > > > > only report values for those metrics when the local > > > node > > > > > was > > > > > > > the > > > > > > > > > > > leader. > > > > > > > > > > > > It > > > > > > > > > > > > > would be nice if we could follow that convention > for > > > > these > > > > > > > > metrics > > > > > > > > > > too, > > > > > > > > > > > > and > > > > > > > > > > > > > document which states report which metrics in the > > KIP. > > > > What > > > > > > do > > > > > > > > you > > > > > > > > > > > think? > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 21, 2026 at 12:32 PM Tony Tang via dev > < > > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion on KIP-1322: Add > > > metrics > > > > > to > > > > > > > > Kraft > > > > > > > > > > that > > > > > > > > > > > > > > measure the number of fetch timeouts and election > > > > > timeouts > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1322*3A*Add*metrics*to*Kraft*that*measure*the*number*of*fetch*timeouts*and*election*timeouts__;JSsrKysrKysrKysrKysr!!Ayb5sqE7!qPjsZ_186iR3QjEak9hmexMYOhwzDGzvcLYwnVUujYlAy2wAAQchfvSKMr9oG7Mygg608Vz6zFCv5QDQLt1GBmw$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This proposal aims to add new metrics to KRaft > that > > > > track > > > > > > how > > > > > > > > > often > > > > > > > > > > > > fetch > > > > > > > > > > > > > > timeouts and election timeouts occur. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Tony Tang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
