Re: [DISCUSS] KIP-279: Fix log divergence between leader and follower after fast leader fail over

Jason Gustafson Mon, 09 Apr 2018 10:02:06 -0700

Hey Anna,

Thanks for picking this up! I think the solution looks good to me. Just
wanted to check my understanding on one part. When describing the handling
of unclean leader elections, you mention comparing the "complete epoch
lineage" from both brokers in order to converge on the log. I think this
makes it sound a bit scarier than it actually is. In practice, it seems
like we'd only have multiple round trips if we hit a bunch of these already
rare "fast leader failover" cases consecutively. As far as I know, the case
I reported is the only instance we've seen in the wild, and with this
solution, we'd only need one round trip to handle it. So while it may be
theoretically possible to need multiple round trips for convergence, far
and away the common case would only require a very small number (usually
exactly one).


Is that correct?

Thanks,
Jason

On Fri, Apr 6, 2018 at 5:47 PM, Ted Yu <[email protected]> wrote:

> Makes sense.
> Thanks for the explanation.
> -------- Original message --------From: Anna Povzner <[email protected]>
> Date: 4/6/18  5:38 PM  (GMT-08:00) To: [email protected] Subject: Re:
> [DISCUSS] KIP-279: Fix log divergence between leader and follower after
> fast leader fail over
> Hi Ted,
>
> I updated the Rejected Alternatives section with a more thorough
> description of alternatives and reasoning for choosing the solution we
> proposed.
>
> While it is more clear why the second alternative guarantees one roundtrip
> for the clean leader election case, the proposed solution also guarantees
> it. This is based on the fact that we cannot have more than one
> back-to-back leader change due to preferred leader election where the
> leader is not pushed out of the ISR, which means the follower will have at
> most one leader epoch unknown to the new leader, and so the leader will be
> able to respond with the epoch that the follower knows about in the first
> response.
>
> For unclean leader election case, the second alternative reduces the number
> of roundtrips but for rare cases: we need at least 3 fast leader changes to
> see the advantage. Approximate calculation: Proposed solution requires
> (N+1)/2 roundtrips for N fast leader changes (worst-case, could be less
> roundtrips for the same number of leader change); Alternative solution
> requires at most 2 roundtrips (except super rare cases, where we may want
> to limit the size of OffsetForLeaderEpoch request). This comes at the cost
> of a bigger change in the OffsetForLeaderEpoch request,
> larger OffsetForLeaderEpoch request size on average, and additional
> complexity of dealing with how long the sequence should be for the
> subsequent OffsetForLeaderEpoch requests, handling the edge/contrived cases
> where sequence may become too long.
>
> So, I think, the main trade-off here is improving efficiency of a broker
> becoming a follower in rare cases of unclean leader election/at least 3
> fast leader changes vs. less complexity in the common case. The proposed
> solution in the KIP is for less complexity.
>
> Please let me know if you have any concerns or suggestions.
>
> Thanks,
> Anna
>
> On Thu, Apr 5, 2018 at 1:33 PM, Ted Yu <[email protected]> wrote:
>
> > For the second alternative which was rejected (The follower sends all
> > sequences of {leader_epoch, end_offset})
> >
> > bq. also increases the size of OffsetForLeaderEpoch request by at least
> > 64bit
> >
> > Though the size increases, the number of roundtrips is reduced
> meaningfully
> > which would increase the robustness of the solution.
> >
> > Please expand the reasoning for unclean leader election for this
> > alternative.
> >
> > Thanks
> >
> > On Thu, Apr 5, 2018 at 12:17 PM, Anna Povzner <[email protected]> wrote:
> >
> > > Hi,
> > >
> > >
> > > I just created KIP-279 to fix edge cases of log divergence for both
> clean
> > > and unclean leader election configs.
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 279%3A+Fix+log+divergence+between+leader+and+follower+
> > > after+fast+leader+fail+over
> > >
> > >
> > > The KIP is basically a follow up to KIP-101, and proposes a slight
> > > extension to the replication protocol to fix edge cases where logs can
> > > diverge due to fast leader fail over.
> > >
> > >
> > > Feedback and suggestions are welcome!
> > >
> > >
> > > Thanks,
> > >
> > > Anna
> > >
> >
>

Re: [DISCUSS] KIP-279: Fix log divergence between leader and follower after fast leader fail over

Reply via email to