Re: [DISCUSS] KIP-279: Fix log divergence between leader and follower after fast leader fail over

Anna Povzner Mon, 09 Apr 2018 17:19:56 -0700

Ted and Jason, I see now how the description of unclean leader election
made the proposed approach sound more complicated (like there are more
roundtrips). I wrote it in such way to show correctness, where
theoretically, we could compare the "complete epoch lineage", but in
practice we compare only one or two recent leader epochs.


So, Jason's statements are correct. The common case for unclean leader
election is still one roundtrip, including the rare case reported in
KAFKA-6361 (two fast consecutive leader failovers).

I updated the description of handling unclean leader elections to address
this.



On Mon, Apr 9, 2018 at 10:01 AM, Jason Gustafson <ja...@confluent.io> wrote:

> Hey Anna,
>
> Thanks for picking this up! I think the solution looks good to me. Just
> wanted to check my understanding on one part. When describing the handling
> of unclean leader elections, you mention comparing the "complete epoch
> lineage" from both brokers in order to converge on the log. I think this
> makes it sound a bit scarier than it actually is. In practice, it seems
> like we'd only have multiple round trips if we hit a bunch of these already
> rare "fast leader failover" cases consecutively. As far as I know, the case
> I reported is the only instance we've seen in the wild, and with this
> solution, we'd only need one round trip to handle it. So while it may be
> theoretically possible to need multiple round trips for convergence, far
> and away the common case would only require a very small number (usually
> exactly one).
>
> Is that correct?
>
> Thanks,
> Jason
>
> On Fri, Apr 6, 2018 at 5:47 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Makes sense.
> > Thanks for the explanation.
> > -------- Original message --------From: Anna Povzner <a...@confluent.io>
> > Date: 4/6/18  5:38 PM  (GMT-08:00) To: dev@kafka.apache.org Subject: Re:
> > [DISCUSS] KIP-279: Fix log divergence between leader and follower after
> > fast leader fail over
> > Hi Ted,
> >
> > I updated the Rejected Alternatives section with a more thorough
> > description of alternatives and reasoning for choosing the solution we
> > proposed.
> >
> > While it is more clear why the second alternative guarantees one
> roundtrip
> > for the clean leader election case, the proposed solution also guarantees
> > it. This is based on the fact that we cannot have more than one
> > back-to-back leader change due to preferred leader election where the
> > leader is not pushed out of the ISR, which means the follower will have
> at
> > most one leader epoch unknown to the new leader, and so the leader will
> be
> > able to respond with the epoch that the follower knows about in the first
> > response.
> >
> > For unclean leader election case, the second alternative reduces the
> number
> > of roundtrips but for rare cases: we need at least 3 fast leader changes
> to
> > see the advantage. Approximate calculation: Proposed solution requires
> > (N+1)/2 roundtrips for N fast leader changes (worst-case, could be less
> > roundtrips for the same number of leader change); Alternative solution
> > requires at most 2 roundtrips (except super rare cases, where we may want
> > to limit the size of OffsetForLeaderEpoch request). This comes at the
> cost
> > of a bigger change in the OffsetForLeaderEpoch request,
> > larger OffsetForLeaderEpoch request size on average, and additional
> > complexity of dealing with how long the sequence should be for the
> > subsequent OffsetForLeaderEpoch requests, handling the edge/contrived
> cases
> > where sequence may become too long.
> >
> > So, I think, the main trade-off here is improving efficiency of a broker
> > becoming a follower in rare cases of unclean leader election/at least 3
> > fast leader changes vs. less complexity in the common case. The proposed
> > solution in the KIP is for less complexity.
> >
> > Please let me know if you have any concerns or suggestions.
> >
> > Thanks,
> > Anna
> >
> > On Thu, Apr 5, 2018 at 1:33 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > For the second alternative which was rejected (The follower sends all
> > > sequences of {leader_epoch, end_offset})
> > >
> > > bq. also increases the size of OffsetForLeaderEpoch request by at least
> > > 64bit
> > >
> > > Though the size increases, the number of roundtrips is reduced
> > meaningfully
> > > which would increase the robustness of the solution.
> > >
> > > Please expand the reasoning for unclean leader election for this
> > > alternative.
> > >
> > > Thanks
> > >
> > > On Thu, Apr 5, 2018 at 12:17 PM, Anna Povzner <a...@confluent.io>
> wrote:
> > >
> > > > Hi,
> > > >
> > > >
> > > > I just created KIP-279 to fix edge cases of log divergence for both
> > clean
> > > > and unclean leader election configs.
> > > >
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 279%3A+Fix+log+divergence+between+leader+and+follower+
> > > > after+fast+leader+fail+over
> > > >
> > > >
> > > > The KIP is basically a follow up to KIP-101, and proposes a slight
> > > > extension to the replication protocol to fix edge cases where logs
> can
> > > > diverge due to fast leader fail over.
> > > >
> > > >
> > > > Feedback and suggestions are welcome!
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Anna
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-279: Fix log divergence between leader and follower after fast leader fail over

Reply via email to