Hi David, Thanks for the feedback.
On Mon, Apr 28, 2025 at 2:51 PM David Arthur <david.art...@confluent.io.invalid> wrote: > DA1. It might be more clear if we call the field something like > "LastFetchedHighWaterMark" (similar to "LastFetchedEpoch"). "HighWaterMark" > is already very prevalent in ReplicaManager, so it might be nice to have a > different field name :) I would like to keep the name concise. I think that the FETCH request has a LastFetchedEpoch because Kafka has a lot of epochs with different scope and lifecycle. E.g. producer epoch, partition epoch, partition leader epoch and broker epoch. To my knowledge, Kafka only has one high-watermark. > DA2. Why use a default of max int64 instead of -1? Will these two values > have any practical difference? It seems like both values will have the > effect of bypassing the new logic. Jun asked a similar question and I have updated the KIP to answer this. With respect to the FETCH request, I group the values of the HighWatermark fields into 3 categories: 1. Unknown high-watermark. KRaft models this using -1. The replica manager models this using the log start offset. 2. Known high-watermark. The field would have the range of 0 to maximum value of int64, inclusive. 3. The sending replica doesn't support or implement this KIP. The default value in the schema is solving bullet point 3. In this case the HighWatermark field will not be included in the FETCH request. When the HighWatermark field is not specified, Kafka should behave as it does today. Today Kafka doesn't evaluate the HWM when deciding to park FETCH requests. The logic - or predicate - for parking requests can be "local HWM <= remote HWM". This is always true if the remote HWM is the maximum value of int64 and will behave similar to how Kafka behaves today. If we use -1 for this case then the predicate becomes "remote HWM == -1 OR local HWM <= remote HWM." > DA3. Do we always want to return immediately if the leader sees the > follower lagging behind the HWM? Would there be any benefit to allow the > leader to wait a short time for data to accumulate? Something like an order > of magnitude less time than MaxWaitMs. That's fair. I would consider this an implementation detail. The replica manager implementation takes a lot of things into account when deciding whether to complete or park the FETCH request. I'll update the design to state that the receiving replica could complete the FETCH request if the "remote HWM < local HWM." > DA4. The motivation section seems focused on KRaft (500ms MaxWaitMs). If I > understand correctly, this enhancement will apply to all FETCH, not just in > KRaft. Yes. It also applies to the fetcher threads for regular topics in the brokers. I didn't add them in the motivation section because the motivation is not as strong. As Jun pointed out, fetch-from-follower could benefit from this feature but I don't have any strong evidence for it. I think that we haven't seen increased latency with FFF because the fetcher thread batches all of the topic partition into one FETCH request so the HWM would be replicated for one partition because of other partitions in the same FETCH request. Thanks, -- -José