Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Calvin Liu Tue, 12 Sep 2023 17:22:12 -0700

Hi Colin
Thanks for the comments!

Rewrite the Additional High Watermark advancement requirement
There was feedback on this section that some readers may not be familiar
with HWM and Ack=0,1,all requests. This can help them understand the
proposal. I will rewrite this part for more readability.


Unclean recovery

The plan is to replace the unclean.leader.election.enable with
unclean.recovery.strategy. If the Unclean Recovery is enabled then it deals
with the three options in the unclean.recovery.strategy.

Let’s refine the Unclean Recovery. We have already taken a lot of
suggestions and I hope to enhance the durability of Kafka to the next level
with this KIP.

DescribeTopicRequest

   1.

   Yes, the plan is to replace the MetadataRequest with the
   DescribeTopicRequest for the admin clients. Will check the details.
   2.

   I can let the broker load the ELR info so that they can serve the
   DescribeTopicRequest as well.
   3.

   Yeah, it does not make sense to have the topic id if
   DescribeTopicRequest is only used by the admin client.


Metrics

As for overall cluster health metrics, I think under-min-ISR is still a
useful one. ELR is more like a safety belt. When the ELR is used, the
cluster availability has already been impacted.

Maybe we can have a metric to count the partitions that sum(ISR, ELR) < min
ISR. What do you think?

Yeah, for the ongoing unclean recoveries, the controller can keep an
accurate count through failover because partition registration can indicate
whether a recovery is needed. However, for the happened ones, unless we
want to persist the number somewhere, we can only figure it out from the
log.

On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe <cmcc...@apache.org> wrote:

> Also, we should have metrics that show what is going on with regard to the
> eligible replica set. I'm not sure exactly what to suggest, but something
> that could identify when things are going wrong in the clsuter.
>
> For example, maybe a metric for partitions containing replicas that are
> ineligible to be leader? That would show a spike when a broker had an
> unclean restart.
>
> Ideally, we'd also have a metric that indicates when an unclear leader
> election or a recovery happened. It's a bit tricky because the simple
> thing, of tracking it per controller, may be a bit confusing during
> failovers.
>
> best,
> Colin
>
>
> On Tue, Sep 12, 2023, at 14:25, Colin McCabe wrote:
> > Hi Calvin,
> >
> > Thanks for the KIP. I think this is a great improvement.
> >
> >> Additional High Watermark advance requirement
> >
> > Typo: change "advance" to "advancement"
> >
> >> A bit recap of some key concepts.
> >
> > Typo: change "bit" to "quick"
> >
> >> Ack=1/all produce request. It defines when the Kafka server should
> respond to the produce request
> >
> > I think this section would be clearer if we talked about the new high
> > watermark advancement requirement first, and THEN talked about its
> > impact on acks=0, acks=1, and     acks=all.  acks=all is of course the
> > main case we care about here, so it would be good to lead with that,
> > rather than delving into the technicalities of acks=0/1 first.
> >
> >> Unclean recovery
> >
> > So, here you are introducing a new configuration,
> > unclean.recovery.strategy. The difficult thing here is that there is a
> > lot of overlap with unclean.leader.election.enable. So we have 3
> > different settings for unclean.recovery.strategy, plus 2 different
> > settings for unclean.leader.election.enable, giving a cross product of
> > 6 different options. The following "unclean recovery manager" section
> > only applies to one fo those 6 different possibilities (I think?)
> >
> > I simply don't think we need so many different election types. Really
> > the use-cases we need are people who want NO unclean elections, people
> > who want "the reasonable thing" and people who want avaialbility at all
> > costs.
> >
> > Overall, I feel like the first half of the KIP is about the ELR, and
> > the second half is about reworking unclean leader election. It might be
> > better to move that second half to a separate KIP so that we can figure
> > it out fully. It should be fine to punt this until later and just have
> > the current behavior on empty ELR be waiting for the last known leader
> > to return. After all, that's what we do today.
> >
> >> DescribeTopicRequest
> >
> > Is the intention for AdminClient to use this RPC for
> > Admin#describeTopics ? If so, we need to describe all of the changes to
> > the admin client API, as well as changes to command-line tools like
> > kafka-topics.sh (if there are any). For example, you will probably need
> > changes to TopicDescription.java. You will also need to provide all of
> > the things that admin client needs -- for example,
> > TopicAuthorizedOperations.
> >
> > I also don't think the controller should serve this request. We want to
> > minimize load on the controller. Just like with the other metadata
> > requests like MetadataRequest, this should be served by brokers.
> >
> > It's a bit confusing why both topic ID and topic name are provided to
> > this API. Is the intention that callers should set one but not the
> > other? Or both? This needs to be clarified. Also, if we do want to
> > support lookups by UUID, that is another thing that needs to be added
> > to adminclient.
> >
> > In general, I feel like this should also probably be its own KIP since
> > it's fairly complex
> >
> > best,
> > Colin
> >
> >
> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu wrote:
> >> Hi everyone,
> >> I'd like to discuss a series of enhancement to the replication protocol.
> >>
> >> A partition replica can experience local data loss in unclean shutdown
> >> scenarios where unflushed data in the OS page cache is lost - such as an
> >> availability zone power outage or a server error. The Kafka replication
> >> protocol is designed to handle these situations by removing such
> replicas
> >> from the ISR and only re-adding them once they have caught up and
> therefore
> >> recovered any lost data. This prevents replicas that lost an arbitrary
> log
> >> suffix, which included committed data, from being elected leader.
> >> However, there is a "last replica standing" state which when combined
> with
> >> a data loss unclean shutdown event can turn a local data loss scenario
> into
> >> a global data loss scenario, i.e., committed data can be removed from
> all
> >> replicas. When the last replica in the ISR experiences an unclean
> shutdown
> >> and loses committed data, it will be reelected leader after starting up
> >> again, causing rejoining followers to truncate their logs and thereby
> >> removing the last copies of the committed records which the leader lost
> >> initially.
> >>
> >> The new KIP will maximize the protection and provides MinISR-1
> tolerance to
> >> data loss unclean shutdown events.
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
>

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Reply via email to