Hi David,

Thanks for the feedback.

On Mon, Apr 28, 2025 at 2:51 PM David Arthur
<david.art...@confluent.io.invalid> wrote:
> DA1. It might be more clear if we call the field something like
> "LastFetchedHighWaterMark" (similar to "LastFetchedEpoch"). "HighWaterMark"
> is already very prevalent in ReplicaManager, so it might be nice to have a
> different field name :)

I would like to keep the name concise. I think that the FETCH request
has a LastFetchedEpoch because Kafka has a lot of epochs with
different scope and lifecycle. E.g. producer epoch, partition epoch,
partition leader epoch and broker epoch. To my knowledge, Kafka only
has one high-watermark.

> DA2. Why use a default of max int64 instead of -1? Will these two values
> have any practical difference? It seems like both values will have the
> effect of bypassing the new logic.

Jun asked a similar question and I have updated the KIP to answer
this. With respect to the FETCH request, I group the values of the
HighWatermark fields into 3 categories:
1. Unknown high-watermark. KRaft models this using -1. The replica
manager models this using the log start offset.
2. Known high-watermark. The field would have the range of 0 to
maximum value of int64, inclusive.
3. The sending replica doesn't support or implement this KIP.

The default value in the schema is solving bullet point 3. In this
case the HighWatermark field will not be included in the FETCH
request. When the HighWatermark field is not specified, Kafka should
behave as it does today. Today Kafka doesn't evaluate the HWM when
deciding to park FETCH requests. The logic - or predicate - for
parking requests can be "local HWM <= remote HWM". This is always true
if the remote HWM is the maximum value of int64 and will behave
similar to how Kafka behaves today. If we use -1 for this case then
the predicate becomes "remote HWM == -1 OR local HWM <= remote HWM."

> DA3. Do we always want to return immediately if the leader sees the
> follower lagging behind the HWM? Would there be any benefit to allow the
> leader to wait a short time for data to accumulate? Something like an order
> of magnitude less time than MaxWaitMs.

That's fair. I would consider this an implementation detail. The
replica manager implementation takes a lot of things into account when
deciding whether to complete or park the FETCH request. I'll update
the design to state that the receiving replica could complete the
FETCH request if the "remote HWM < local HWM."

> DA4. The motivation section seems focused on KRaft (500ms MaxWaitMs). If I
> understand correctly, this enhancement will apply to all FETCH, not just in
> KRaft.

Yes. It also applies to the fetcher threads for regular topics in the
brokers. I didn't add them in the motivation section because the
motivation is not as strong. As Jun pointed out, fetch-from-follower
could benefit from this feature but I don't have any strong evidence
for it. I think that we haven't seen increased latency with FFF
because the fetcher thread batches all of the topic partition into one
FETCH request so the HWM would be replicated for one partition because
of other partitions in the same FETCH request.

Thanks,
-- 
-José

Reply via email to