Hey Andrew - Thank you for taking the time to reply to my questions. I'm just adding some notes to this discussion.
1. epoch: It can be helpful to know the delta of the client side and the actual leader epoch. It is helpful to understand why sometimes commit fails/client not making progress. 2. Client connection: If the client selects the "wrong" connection to push out the data, I assume the request would timeout; which should lead to disconnecting from the node and reselecting another node as you mentioned, via the least loaded node. Cheers, P On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: > Hi Philip, > Thanks for your vote and interest in the KIP. > > KIP-714 does not introduce any new client metrics, and that’s intentional. > It does > tell how that all of the client metrics can have their names transformed > into > equivalent "telemetry metric names”, and then potentially used in metrics > subscriptions. > > I am interested in the idea of client’s leader epoch in this context, but > I don’t have > an immediate plan for how best to do this, and it would take another KIP > to enhance > existing metrics or introduce some new ones. Those would then naturally be > applicable to the metrics push introduced in KIP-714. > > In a similar vein, there are no existing client metrics specifically for > auto-commit. > We could add them to Kafka, but I really think this is just an example of > asynchronous > commit in which the application has decided not to specify when the commit > should > begin. > > It is possible to increase the cadence of pushing by modifying the > interval.ms > configuration property of the CLIENT_METRICS resource. > > There is an “assigned-partitions” metric for each consumer, but not one for > active partitions. We could add one, again as a follow-on KIP. > > I take your point about holding on to a connection in a channel which might > experience congestion. Do you have a suggestion for how to improve on this? > For example, the client does have the concept of a least-loaded node. Maybe > this is something we should investigate in the implementation and decide > on the > best approach. In general, I think sticking with the same node for > consecutive > pushes is best, but if you choose the “wrong” node to start with, it’s not > ideal. > > Thanks, > Andrew > > > On 8 Sep 2023, at 19:29, Philip Nee <philip...@gmail.com> wrote: > > > > Hey Andrew - > > > > +1 but I don't have a binding vote! > > > > It took me a while to go through the KIP. Here are some of my notes > during > > the reading: > > > > *Metrics* > > - Should we care about the client's leader epoch? There is a case where > the > > user recreates the topic, but the consumer thinks it is still the same > > topic and therefore, attempts to start from an offset that doesn't exist. > > KIP-848 addresses this issue, but I can still see some potential benefits > > from knowing the client's epoch information. > > - I assume poll idle is similar to poll interval: I needed to read the > > description a few times. > > - I don't have a clear use case in mind for the commit latency, but I do > > think sometimes people lack clarity about how much progress was tracked > by > > the auto-commit. Would tracking auto-commit-related metrics be useful? I > > was thinking: the last offset committed or the actual cadence in ms. > > - Are there cases when we need to increase the cadence of telemetry data > > push? i.e. variable interval. > > - Thanks for implementing the randomized initial metric push; I think it > is > > really important. > > - Is there a potential use case for tracking the number of active > > partitions? The consumer can pause partitions via API, during revocation, > > or during offset reset for the stream. > > > > *Connections*: > > - The KIP stated that it will keep the same connection until the > connection > > is disconnected. I wonder if that could potentially cause congestion if > it > > is already a busy channel, which leads to connection timeout and > > subsequently disconnection. > > > > Thanks, > > P > > > > On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield < > > andrew_schofield_j...@outlook.com> wrote: > > > >> Bumping the voting thread for KIP-714. > >> > >> So far, we have: > >> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) > >> > >> Thanks, > >> Andrew > >> > >>> On 4 Aug 2023, at 09:45, Andrew Schofield <andrew_schofi...@live.com> > >> wrote: > >>> > >>> Hi, > >>> After almost 2 1/2 years in the making, I would like to call a vote for > >> KIP-714 ( > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > >> ). > >>> > >>> This KIP aims to improve monitoring and troubleshooting of client > >> performance by enabling clients to push metrics to brokers. > >>> > >>> I’d like to thank everyone that participated in the discussion, > >> especially the librdkafka team since one of the aims of the KIP is to > >> enable any client to participate, not just the Apache Kafka project’s > Java > >> clients. > >>> > >>> Thanks, > >>> Andrew > > >