Hi José,

Thank you for this much-needed improvement to metadata propagation.

Maybe I missed something, but even after reading the discussion below, I still 
don't understand the rationale for separating the "RPC version too old" and 
"high watermark not known" cases. Is the idea that separating these cases will 
make debugging easier? Like if we see a -1 on the wire, we know that it is an 
unknown HWM situation, and not an older RPC version? Or is there some other 
reason for separating the two?

thanks,
Colin


On Mon, Apr 28, 2025, at 14:27, José Armando García Sancio wrote:
> Hi David,
>
> Thanks for the feedback.
>
> On Mon, Apr 28, 2025 at 2:51 PM David Arthur
> <david.art...@confluent.io.invalid> wrote:
>> DA1. It might be more clear if we call the field something like
>> "LastFetchedHighWaterMark" (similar to "LastFetchedEpoch"). "HighWaterMark"
>> is already very prevalent in ReplicaManager, so it might be nice to have a
>> different field name :)
>
> I would like to keep the name concise. I think that the FETCH request
> has a LastFetchedEpoch because Kafka has a lot of epochs with
> different scope and lifecycle. E.g. producer epoch, partition epoch,
> partition leader epoch and broker epoch. To my knowledge, Kafka only
> has one high-watermark.
>
>> DA2. Why use a default of max int64 instead of -1? Will these two values
>> have any practical difference? It seems like both values will have the
>> effect of bypassing the new logic.
>
> Jun asked a similar question and I have updated the KIP to answer
> this. With respect to the FETCH request, I group the values of the
> HighWatermark fields into 3 categories:
> 1. Unknown high-watermark. KRaft models this using -1. The replica
> manager models this using the log start offset.
> 2. Known high-watermark. The field would have the range of 0 to
> maximum value of int64, inclusive.
> 3. The sending replica doesn't support or implement this KIP.
>
> The default value in the schema is solving bullet point 3. In this
> case the HighWatermark field will not be included in the FETCH
> request. When the HighWatermark field is not specified, Kafka should
> behave as it does today. Today Kafka doesn't evaluate the HWM when
> deciding to park FETCH requests. The logic - or predicate - for
> parking requests can be "local HWM <= remote HWM". This is always true
> if the remote HWM is the maximum value of int64 and will behave
> similar to how Kafka behaves today. If we use -1 for this case then
> the predicate becomes "remote HWM == -1 OR local HWM <= remote HWM."
>
>> DA3. Do we always want to return immediately if the leader sees the
>> follower lagging behind the HWM? Would there be any benefit to allow the
>> leader to wait a short time for data to accumulate? Something like an order
>> of magnitude less time than MaxWaitMs.
>
> That's fair. I would consider this an implementation detail. The
> replica manager implementation takes a lot of things into account when
> deciding whether to complete or park the FETCH request. I'll update
> the design to state that the receiving replica could complete the
> FETCH request if the "remote HWM < local HWM."
>
>> DA4. The motivation section seems focused on KRaft (500ms MaxWaitMs). If I
>> understand correctly, this enhancement will apply to all FETCH, not just in
>> KRaft.
>
> Yes. It also applies to the fetcher threads for regular topics in the
> brokers. I didn't add them in the motivation section because the
> motivation is not as strong. As Jun pointed out, fetch-from-follower
> could benefit from this feature but I don't have any strong evidence
> for it. I think that we haven't seen increased latency with FFF
> because the fetcher thread batches all of the topic partition into one
> FETCH request so the HWM would be replicated for one partition because
> of other partitions in the same FETCH request.
>
> Thanks,
> -- 
> -José

Reply via email to