Hi Thomas,

Thanks for the detailed write-up. I like the direction of using local log
coverage to steer leader election and reduce the KIP-1023 risk of
thin-local leaders, without hard-blocking election when the cluster has no
better option. A few follow-ups would help me understand how this will
behave in mixed production environments:

MG1:
How does the default message-based gap perform across very different
partition workloads? On quiet topics vs hot partitions, the same offset
delta can represent quite different “amount of local” in practice. Would
you consider adding the topic-level versions of the new configs as Ivan
pointed out above?

MG2:
If a subset of replicas (or a rack) systematically has a lower
localLogStartOffset and thus is preferred for leadership more often, do you
see that as a material risk for cross-rack traffic or per-broker load skew?
If so, is there any guidance in scope for this KIP, or a pointer to
follow-up work (e.g. placement, reassignment, or monitoring) so teams can
keep leadership balanced while still using this feature?

MG3:
How much extra control-plane and metadata work do you expect from LSO
updates and added AlterPartition calls? Particularly for large clusters.
Also, when the feature is off or tiered storage isn’t used, does this path
remain completely quiet?

Regards,
Manan Gupta

On Fri, Apr 24, 2026 at 6:48 PM Ivan Yurchenko <[email protected]> wrote:

> Hi Thomas,
>
> Thank you for the KIP. The motivation makes sense to me. I have a couple
> of comments:
>
> IY1:
> > When `leader.election.prefer.early.local.log.start.offset is enabled`,
> the key change is to sort targetReplicas by local-log-start-offset
> (ascending) before selecting a leader. This ensures replicas with more
> local data (lower local-log-start-offset) are considered first in both
> election paths.
>
> I assume here it meant to say "sort stably", to preserve the original
> preference order as much as possible?
>
> IY2:
> Can we find a reason for a particular topic to not follow the new leader
> election algorithm, or it is strictly better and once enabled it's not
> expected to be disabled? If the answer is yes, would you consider adding
> the topic-level versions of the new configs
> `leader.election.prefer.early.local.log.start.offset` and
> `leader.election.local.log.start.offset.threshold`?
>
> Best,
> Ivan
>
>
> On Mon, Mar 30, 2026, at 20:43, Thomas Thornton via dev wrote:
> > Hi all,
> >
> > We want to start a discussion thread for KIP-1303: Deprioritize Tiered
> > Storage Followers In Leader Election.
> >
> > The adopted KIP-1023 introduced an optimization allowing followers to
> > skip replicating data already in remote storage, dramatically reducing
> > ISR join time. However, as noted in KIP-1023, this creates a risk: if
> > such a follower becomes leader, it may need to serve consumer requests
> > from remote storage, impacting performance.
> >
> > This KIP proposes to mitigate this risk by preferring replicas with
> > more local data (lower localLogStartOffset) during leader election.
> > Key changes include:
> > 1) New config leader.election.prefer.early.local.log.start.offset to
> > enable the feature
> > 2) New config leader.election.local.log.start.offset.threshold to
> > avoid leader churn from minor retention timing differences
> > 3) Extending FetchRequest and AlterPartition to propagate
> > localLogStartOffset from followers → leader → controller
> >
> > The full KIP is available here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1303%3A+Deprioritize+Tiered+Storage+Followers+In+Leader+Election
> >
> > Thanks,
> > Tom
> >
>

Reply via email to