Hi Ted,

I updated the Rejected Alternatives section with a more thorough
description of alternatives and reasoning for choosing the solution we
proposed.

While it is more clear why the second alternative guarantees one roundtrip
for the clean leader election case, the proposed solution also guarantees
it. This is based on the fact that we cannot have more than one
back-to-back leader change due to preferred leader election where the
leader is not pushed out of the ISR, which means the follower will have at
most one leader epoch unknown to the new leader, and so the leader will be
able to respond with the epoch that the follower knows about in the first
response.

For unclean leader election case, the second alternative reduces the number
of roundtrips but for rare cases: we need at least 3 fast leader changes to
see the advantage. Approximate calculation: Proposed solution requires
(N+1)/2 roundtrips for N fast leader changes (worst-case, could be less
roundtrips for the same number of leader change); Alternative solution
requires at most 2 roundtrips (except super rare cases, where we may want
to limit the size of OffsetForLeaderEpoch request). This comes at the cost
of a bigger change in the OffsetForLeaderEpoch request,
larger OffsetForLeaderEpoch request size on average, and additional
complexity of dealing with how long the sequence should be for the
subsequent OffsetForLeaderEpoch requests, handling the edge/contrived cases
where sequence may become too long.

So, I think, the main trade-off here is improving efficiency of a broker
becoming a follower in rare cases of unclean leader election/at least 3
fast leader changes vs. less complexity in the common case. The proposed
solution in the KIP is for less complexity.

Please let me know if you have any concerns or suggestions.

Thanks,
Anna

On Thu, Apr 5, 2018 at 1:33 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> For the second alternative which was rejected (The follower sends all
> sequences of {leader_epoch, end_offset})
>
> bq. also increases the size of OffsetForLeaderEpoch request by at least
> 64bit
>
> Though the size increases, the number of roundtrips is reduced meaningfully
> which would increase the robustness of the solution.
>
> Please expand the reasoning for unclean leader election for this
> alternative.
>
> Thanks
>
> On Thu, Apr 5, 2018 at 12:17 PM, Anna Povzner <a...@confluent.io> wrote:
>
> > Hi,
> >
> >
> > I just created KIP-279 to fix edge cases of log divergence for both clean
> > and unclean leader election configs.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 279%3A+Fix+log+divergence+between+leader+and+follower+
> > after+fast+leader+fail+over
> >
> >
> > The KIP is basically a follow up to KIP-101, and proposes a slight
> > extension to the replication protocol to fix edge cases where logs can
> > diverge due to fast leader fail over.
> >
> >
> > Feedback and suggestions are welcome!
> >
> >
> > Thanks,
> >
> > Anna
> >
>

Reply via email to