Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Jun Rao Mon, 25 Sep 2023 15:49:02 -0700

Hi, Calvin,

Thanks for the updated KIP. Made another pass. A few more comments below.


40. unclean.leader.election.enable.false ->
unclean.recovery.strategy.Balanced: The Balanced mode could still lead to
data loss. So, I am wondering if unclean.leader.election.enable.false
should map to None?

41. unclean.recovery.manager.enabled: I am not sure why we introduce this
additional config. Is it the same as unclean.recovery.strategy=None?

42. DescribeTopicResponse.TopicAuthorizedOperations: Should this be at the
topic level?

43. "Limit: 20 topics max per request": Could we describe what happens if
the request includes more than 20 topics?

44. ElectLeadersRequest.DesiredLeaders: Could we describe whether the
ordering matters?

45. GetReplicaLogInfo.TopicPartitions: "about": "The topic partitions to
elect leaders.": The description in "about" is incorrect.

46. GetReplicaLogInfoResponse: Should we nest partitions under topicId to
be consistent with other types of responses?

47. kafka-leader-election.sh:
47.1 Could we explain DESIGNATION?
47.2 desiredLeader: Should it be a list to match the field in
ElectLeadersRequest?

48. We could add a section on downgrade?

49. LastKnownLeader: This seems only needed in the first phase of
delivering ELR. Will it be removed when the complete KIP is delivered?

Thanks,

Jun

On Tue, Sep 19, 2023 at 1:30 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Calvin,
>
> Thanks for the explanations. I like the idea of using none, balanced,
> aggressive. We also had an offline discussion about why it is good to use a
> new config key (basically, so that we can deprecate the old one which had
> only false/true values in 4.0) With these changes, I am +1.
>
> best,
> Colin
>
> On Mon, Sep 18, 2023, at 15:54, Calvin Liu wrote:
> > Hi Colin,
> > Also, can we deprecate unclean.leader.election.enable in 4.0? Before
> that,
> > we can have both the config unclean.recovery.strategy and
> > unclean.leader.election.enable
> > and using the unclean.recovery.Enabled to determine which config to use
> > during the unclean leader election.
> >
> > On Mon, Sep 18, 2023 at 3:51 PM Calvin Liu <ca...@confluent.io> wrote:
> >
> >> Hi Colin,
> >> For the unclean.recovery.strategy config name, how about we use the
> >> following
> >> None. It basically means no unclean recovery will be performed.
> >> Aggressive. It means availability goes first. Whenever the partition
> can't
> >> elect a durable replica, the controller will try the unclean recovery.
> >> Balanced. It is the balance point of the availability first(Aggressive)
> >> and least availability(None). The controller performs unclean recovery
> when
> >> both ISR and ELR are empty.
> >>
> >>
> >> On Fri, Sep 15, 2023 at 11:42 AM Calvin Liu <ca...@confluent.io> wrote:
> >>
> >>> Hi Colin,
> >>>
> >>> > So, the proposal is that if someone sets
> "unclean.leader.election.enable
> >>> = true"...
> >>>
> >>>
> >>> The idea is to use one of the unclean.leader.election.enable and
> >>> unclean.recovery.strategy based on the unclean.recovery.Enabled. A
> possible
> >>> version can be
> >>>
> >>> If unclean.recovery.Enabled:
> >>>
> >>> {
> >>>
> >>> Check unclean.recovery.strategy. If set, use it. Otherwise, check
> >>> unclean.leader.election.enable and translate it to
> >>> unclean.recovery.strategy.
> >>>
> >>> } else {
> >>>
> >>> Use unclean.leader.election.enable
> >>>
> >>> }
> >>>
> >>>
> >>> —--------
> >>>
> >>> >The configuration key should be "unclean.recovery.manager.enabled",
> >>> right?
> >>>
> >>>
> >>> I think we have two ways of choosing a leader uncleanly, unclean leader
> >>> election and unclean recovery(log inspection) and we try to switch
> between
> >>> them.
> >>>
> >>> Do you mean we want to develop two ways of performing the unclean
> >>> recovery and one of them is using “unclean recovery manager”? I guess
> we
> >>> haven’t discussed the second way.
> >>>
> >>>
> >>> —-------
> >>>
> >>> >How do these 4 levels of overrides interact with your new
> >>> configurations?
> >>>
> >>>
> >>> I do notice in the Kraft controller code, the method to check whether
> >>> perform unclean leader election is hard coded to false since
> >>> 2021(uncleanLeaderElectionEnabledForTopic). Isn’t it a good chance to
> >>> completely deprecate the unclean.leader.election.enable? We don’t even
> have
> >>> to worry about the config conversion.
> >>>
> >>> On the other hand, whatever the override is, as long as the controller
> >>> can have the final effective unclean.leader.election.enable, the topic
> >>> level config unclean.recovery.strategy, the cluster level config
> >>> unclean.recovery.Enabled, the controller can calculate the correct
> methods
> >>> to use right?
> >>>
> >>>
> >>> On Fri, Sep 15, 2023 at 10:02 AM Colin McCabe <cmcc...@apache.org>
> wrote:
> >>>
> >>>> On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote:
> >>>> > Hi Colin
> >>>> > 1. I think using the new config name is more clear.
> >>>> >        a. The unclean leader election is actually removed if unclean
> >>>> > recovery is in use.
> >>>> >        b. Using multiple values in unclean.leader.election.enable is
> >>>> > confusing and it will be more confusing after people forget about
> this
> >>>> > discussion.
> >>>>
> >>>> Hi Calvin,
> >>>>
> >>>> So, the proposal is that if someone sets
> "unclean.leader.election.enable
> >>>> = true" but then sets one of your new configurations, the value of
> >>>> unclean.leader.election.enable is ignored? That seems less clear to
> me, not
> >>>> more. Just in general, having multiple configuration keys to control
> the
> >>>> same thing confuses users. Basically, they are sitting at a giant
> control
> >>>> panel, and some of the levers do nothing.
> >>>>
> >>>> > 2. Sorry I forgot to mention in the response that I did add the
> >>>> > unclean.recovery.Enabled flag.
> >>>>
> >>>> The configuration key should be "unclean.recovery.manager.enabled",
> >>>> right? Becuase we can do "unclean recovery" without the manager.
> Disabling
> >>>> the manager just means we use a different mechanism for recovery.
> >>>>
> >>>> >        c. Maybe I underestimated the challenge of replacing the
> >>>> config. Any
> >>>> > implementation problems ahead?
> >>>>
> >>>> There are four levels of overrides for unclean.leader.election.enable.
> >>>>
> >>>> 1. static configuration for node.
> >>>>     This goes in the configuration file, typically named
> >>>> server.properties
> >>>>
> >>>> 2. dynamic configuration for node default
> >>>>   ConfigResource(type=BROKER, name="")
> >>>>
> >>>> 3. dynamic configuration for node
> >>>>   ConfigResource(type=BROKER, name=<controller id>)
> >>>>
> >>>> 4. dynamic configuration for topic
> >>>>   ConfigResource(type=TOPIC, name=<topic-name>)
> >>>>
> >>>> How do these 4 levels of overrides interact with your new
> >>>> configurations? If the new configurations dominate over the old ones,
> it
> >>>> seems like this will get a lot more confusing to implement (and also
> to
> >>>> use.)
> >>>>
> >>>> Again, I'd recommend just adding some new values to
> >>>> unclean.leader.election.enable. It's simple and will prevent user
> confusion
> >>>> (as well as developer confusion.)
> >>>>
> >>>> best,
> >>>> Colin
> >>>>
> >>>>
> >>>> > 3. About the admin client, I mentioned 3 changes in the client.
> >>>> Anything
> >>>> > else I missed in the KIP?
> >>>> >       a. The client will switch to using the new RPC instead of
> >>>> > MetadataRequest for the topics.
> >>>> >       b. The TopicPartitionInfo used in TopicDescription needs to
> add
> >>>> new
> >>>> > fields related to the ELR.
> >>>> >       c. The outputs will add the ELR related fields.
> >>>> >
> >>>> > On Thu, Sep 14, 2023 at 9:19 PM Colin McCabe <cmcc...@apache.org>
> >>>> wrote:
> >>>> >
> >>>> >> Hi Calvin,
> >>>> >>
> >>>> >> Thanks for the changes.
> >>>> >>
> >>>> >> 1. Earlier I commented that creating "unclean.recovery.strategy "
> is
> >>>> not
> >>>> >> necessary, and we can just reuse the existing
> >>>> >> "unclean.leader.election.enable" configuration key. Let's discuss
> >>>> that.
> >>>> >>
> >>>> >> 2.I also don't understand why you didn't add a configuration to
> >>>> enable or
> >>>> >> disable the Unclean Recovery Manager. This seems like a very simple
> >>>> way to
> >>>> >> handle the staging issue which we discussed. The URM can just be
> >>>> turned off
> >>>> >> until it is production ready. Let's discuss this.
> >>>> >>
> >>>> >> 3. You still need to describe the changes to AdminClient that are
> >>>> needed
> >>>> >> to use DescribeTopicRequest.
> >>>> >>
> >>>> >> Keep at it. It's looking better. :)
> >>>> >>
> >>>> >> best,
> >>>> >> Colin
> >>>> >>
> >>>> >>
> >>>> >> On Thu, Sep 14, 2023, at 11:03, Calvin Liu wrote:
> >>>> >> > Hi Colin
> >>>> >> > Thanks for the comments!
> >>>> >> >
> >>>> >> > I did the following changes
> >>>> >> >
> >>>> >> >    1.
> >>>> >> >
> >>>> >> >    Simplified the API spec section to only include the diff.
> >>>> >> >    2.
> >>>> >> >
> >>>> >> >    Reordered the HWM requirement section.
> >>>> >> >    3.
> >>>> >> >
> >>>> >> >    Removed the URM implementation details to keep the necessary
> >>>> >> >    characteristics to perform the unclean recovery.
> >>>> >> >    1.
> >>>> >> >
> >>>> >> >       When to perform the unclean recovery
> >>>> >> >       2.
> >>>> >> >
> >>>> >> >       Under different config, how the unclean recovery finds the
> >>>> leader.
> >>>> >> >       3.
> >>>> >> >
> >>>> >> >       How the config unclean.leader.election.enable and
> >>>> >> >       unclean.recovery.strategy are converted when users
> >>>> enable/disable
> >>>> >> the
> >>>> >> >       unclean recovery.
> >>>> >> >       4.
> >>>> >> >
> >>>> >> >    More details about how we change admin client.
> >>>> >> >    5.
> >>>> >> >
> >>>> >> >    API limits on the GetReplicaLogInfoRequest and
> >>>> DescribeTopicRequest.
> >>>> >> >    6.
> >>>> >> >
> >>>> >> >    Two metrics added
> >>>> >> >    1.
> >>>> >> >
> >>>> >> >       Kafka.controller.global_under_min_isr_partition_count
> >>>> >> >       2.
> >>>> >> >
> >>>> >> >       kafka.controller.unclean_recovery_finished_count
> >>>> >> >
> >>>> >> >
> >>>> >> > On Wed, Sep 13, 2023 at 10:46 AM Colin McCabe <
> cmcc...@apache.org>
> >>>> >> wrote:
> >>>> >> >
> >>>> >> >> On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote:
> >>>> >> >> > Hi Colin
> >>>> >> >> > Thanks for the comments!
> >>>> >> >> >
> >>>> >> >>
> >>>> >> >> Hi Calvin,
> >>>> >> >>
> >>>> >> >> Thanks again for the KIP.
> >>>> >> >>
> >>>> >> >> One meta-comment: it's usually better to just do a diff on a
> >>>> message
> >>>> >> spec
> >>>> >> >> file or java file if you're including changes to it in the KIP.
> >>>> This is
> >>>> >> >> easier to read than looking for "new fields begin" etc. in the
> >>>> text, and
> >>>> >> >> gracefully handles the case where existing fields were changed.
> >>>> >> >>
> >>>> >> >> > Rewrite the Additional High Watermark advancement requirement
> >>>> >> >> > There was feedback on this section that some readers may not
> be
> >>>> >> familiar
> >>>> >> >> > with HWM and Ack=0,1,all requests. This can help them
> understand
> >>>> the
> >>>> >> >> > proposal. I will rewrite this part for more readability.
> >>>> >> >> >
> >>>> >> >>
> >>>> >> >> To be clear, I wasn't suggesting dropping either section. I
> agree
> >>>> that
> >>>> >> >> they add useful background. I was just suggesting that we should
> >>>> discuss
> >>>> >> >> the "acks" setting AFTER discussing the new high watermark
> >>>> advancement
> >>>> >> >> conditions. We also should discuss acks=0. While it isn't
> >>>> conceptually
> >>>> >> much
> >>>> >> >> different than acks=1 here, its omission from this section is
> >>>> confusing.
> >>>> >> >>
> >>>> >> >> > Unclean recovery
> >>>> >> >> >
> >>>> >> >> > The plan is to replace the unclean.leader.election.enable with
> >>>> >> >> > unclean.recovery.strategy. If the Unclean Recovery is enabled
> >>>> then it
> >>>> >> >> deals
> >>>> >> >> > with the three options in the unclean.recovery.strategy.
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > Let’s refine the Unclean Recovery. We have already taken a
> lot of
> >>>> >> >> > suggestions and I hope to enhance the durability of Kafka to
> the
> >>>> next
> >>>> >> >> level
> >>>> >> >> > with this KIP.
> >>>> >> >>
> >>>> >> >> I am OK with doing the unclean leader recovery improvements in
> >>>> this KIP.
> >>>> >> >> However, I think we need to really work on the configuration
> >>>> settings.
> >>>> >> >>
> >>>> >> >> Configuration overrides are often quite messy. For example, the
> >>>> cases
> >>>> >> >> where we have log.roll.hours and log.roll.segment.ms, the user
> >>>> has to
> >>>> >> >> remember which one takes precedence, and it is not obvious. So,
> >>>> rather
> >>>> >> than
> >>>> >> >> creating a new configuration, why not add additional values to
> >>>> >> >> "unclean.leader.election.enable"? I think this will be simpler
> for
> >>>> >> people
> >>>> >> >> to understand, and simpler in the code as well.
> >>>> >> >>
> >>>> >> >> What if we continued to use "unclean.leader.election.enable" but
> >>>> >> extended
> >>>> >> >> it so that it took a string? Then the string could have these
> >>>> values:
> >>>> >> >>
> >>>> >> >> never
> >>>> >> >>     never automatically do an unclean leader election under any
> >>>> >> conditions
> >>>> >> >>
> >>>> >> >> false / default
> >>>> >> >>     only do an unclean leader election if there may be possible
> >>>> data
> >>>> >> loss
> >>>> >> >>
> >>>> >> >> true / always
> >>>> >> >>     always do an unclean leader election if we can't immediately
> >>>> elect a
> >>>> >> >> leader
> >>>> >> >>
> >>>> >> >> It's a bit awkward that false maps to default rather than to
> >>>> never. But
> >>>> >> >> this awkwardness exists if we use two different configuration
> keys
> >>>> as
> >>>> >> well.
> >>>> >> >> The reason for the awkwardness is that we simply don't want most
> >>>> of the
> >>>> >> >> people currently setting unclean.leader.election.enable=false to
> >>>> get the
> >>>> >> >> "never" behavior. We have to bite that bullet. Better to be
> clear
> >>>> and
> >>>> >> >> explicit than hide it.
> >>>> >> >>
> >>>> >> >> Another thing that's a bit awkward is having two different ways
> to
> >>>> do
> >>>> >> >> unclean leader election specified in the KIP. You descirbe two
> >>>> methods:
> >>>> >> the
> >>>> >> >> simple "choose the last leader" method, and the "unclean
> recovery
> >>>> >> manager"
> >>>> >> >> method. I understand why you did it this way -- "choose the last
> >>>> >> leader" is
> >>>> >> >> simple, and will help us deliver an implementation quickly,
> while
> >>>> the
> >>>> >> URM
> >>>> >> >> is preferable in the long term. My suggestion here is to
> separate
> >>>> the
> >>>> >> >> decision of HOW to do unclean leader election from the decision
> of
> >>>> WHEN
> >>>> >> to
> >>>> >> >> do it.
> >>>> >> >>
> >>>> >> >> So in other words, have "unclean.leader.election.enable" specify
> >>>> when we
> >>>> >> >> do unclean leader election, and have a new configuration like
> >>>> >> >> "unclean.recovery.manager.enable" to determine if we use the
> URM.
> >>>> >> >> Presumably the URM will take some time to get fully stable, so
> >>>> this can
> >>>> >> >> default to false for a while, and we can flip the default to
> true
> >>>> when
> >>>> >> we
> >>>> >> >> feel ready.
> >>>> >> >>
> >>>> >> >> The URM is somewhat under-described here. I think we need a few
> >>>> >> >> configurations here for it. For example, we need a
> configuration to
> >>>> >> specify
> >>>> >> >> how long it should wait for a broker to respond to its RPCs
> before
> >>>> >> moving
> >>>> >> >> on. We also need to understand how the URM interacts with
> >>>> >> >> unclean.leader.election.enable=always. I assume that with
> "always"
> >>>> we
> >>>> >> will
> >>>> >> >> just unconditionally use the URM rather than choosing randomly.
> >>>> But this
> >>>> >> >> should be spelled out in the KIP.
> >>>> >> >>
> >>>> >> >> >
> >>>> >> >> > DescribeTopicRequest
> >>>> >> >> >
> >>>> >> >> >    1.
> >>>> >> >> >    Yes, the plan is to replace the MetadataRequest with the
> >>>> >> >> >    DescribeTopicRequest for the admin clients. Will check the
> >>>> details.
> >>>> >> >>
> >>>> >> >> Sounds good. But as I said, you need to specify how AdminClient
> >>>> >> interacts
> >>>> >> >> with the new request. This will involve adding some fields to
> >>>> >> >> TopicDescription.java. And you need to specify the changes to
> the
> >>>> >> >> kafka-topics.sh command line tool. Otherwise we cannot use the
> >>>> tool to
> >>>> >> see
> >>>> >> >> the new information.
> >>>> >> >>
> >>>> >> >> The new requests, DescribeTopicRequest and
> >>>> GetReplicaLogInfoRequest,
> >>>> >> need
> >>>> >> >> to have limits placed on them so that their size can't be
> >>>> infinite. We
> >>>> >> >> don't want to propagate the current problems of MetadataRequest,
> >>>> where
> >>>> >> >> clients can request massive responses that can mess up the JVM
> when
> >>>> >> handled.
> >>>> >> >>
> >>>> >> >> Adding limits is simple for GetReplicaLogInfoRequest -- we can
> >>>> just say
> >>>> >> >> that only 2000 partitions at a time can be requested. For
> >>>> >> >> DescribeTopicRequest we can probably just limit to 20 topics or
> >>>> >> something
> >>>> >> >> like that, to avoid the complexity of doing pagination in this
> KIP.
> >>>> >> >>
> >>>> >> >> >    2.
> >>>> >> >> >    I can let the broker load the ELR info so that they can
> serve
> >>>> the
> >>>> >> >> >    DescribeTopicRequest as well.
> >>>> >> >>
> >>>> >> >> Yes, it's fine to add to MetadataCache. In fact, you'll be
> loading
> >>>> it
> >>>> >> >> anyway once it's added to PartitionImage.
> >>>> >> >>
> >>>> >> >> >    3.
> >>>> >> >> >    Yeah, it does not make sense to have the topic id if
> >>>> >> >> >    DescribeTopicRequest is only used by the admin client.
> >>>> >> >>
> >>>> >> >> OK. That makes things simpler. We can always create a new API
> later
> >>>> >> >> (hopefully not in this KIP!) to query by topic ID.
> >>>> >> >>
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > Metrics
> >>>> >> >> >
> >>>> >> >> > As for overall cluster health metrics, I think under-min-ISR
> is
> >>>> still
> >>>> >> a
> >>>> >> >> > useful one. ELR is more like a safety belt. When the ELR is
> >>>> used, the
> >>>> >> >> > cluster availability has already been impacted.
> >>>> >> >> >
> >>>> >> >> > Maybe we can have a metric to count the partitions that
> sum(ISR,
> >>>> ELR)
> >>>> >> <
> >>>> >> >> min
> >>>> >> >> > ISR. What do you think?
> >>>> >> >>
> >>>> >> >> How about:
> >>>> >> >>
> >>>> >> >> A.  a metric for the totoal number of under-min-isr partitions?
> We
> >>>> don't
> >>>> >> >> have that in Apache Kafka at the moment.
> >>>> >> >>
> >>>> >> >> B. a metric for the number of unclean leader elections we did
> (for
> >>>> >> >> simplicity, it can reset to 0 on controller restart: we expect
> >>>> people to
> >>>> >> >> monitor the change over time anyway)
> >>>> >> >>
> >>>> >> >> best,
> >>>> >> >> Colin
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> >
> >>>> >> >> > Yeah, for the ongoing unclean recoveries, the controller can
> >>>> keep an
> >>>> >> >> > accurate count through failover because partition registration
> >>>> can
> >>>> >> >> indicate
> >>>> >> >> > whether a recovery is needed. However, for the happened ones,
> >>>> unless
> >>>> >> we
> >>>> >> >> > want to persist the number somewhere, we can only figure it
> out
> >>>> from
> >>>> >> the
> >>>> >> >> > log.
> >>>> >> >> >
> >>>> >> >> > On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe <
> cmcc...@apache.org
> >>>> >
> >>>> >> wrote:
> >>>> >> >> >
> >>>> >> >> >> Also, we should have metrics that show what is going on with
> >>>> regard
> >>>> >> to
> >>>> >> >> the
> >>>> >> >> >> eligible replica set. I'm not sure exactly what to suggest,
> but
> >>>> >> >> something
> >>>> >> >> >> that could identify when things are going wrong in the
> clsuter.
> >>>> >> >> >>
> >>>> >> >> >> For example, maybe a metric for partitions containing
> replicas
> >>>> that
> >>>> >> are
> >>>> >> >> >> ineligible to be leader? That would show a spike when a
> broker
> >>>> had an
> >>>> >> >> >> unclean restart.
> >>>> >> >> >>
> >>>> >> >> >> Ideally, we'd also have a metric that indicates when an
> unclear
> >>>> >> leader
> >>>> >> >> >> election or a recovery happened. It's a bit tricky because
> the
> >>>> simple
> >>>> >> >> >> thing, of tracking it per controller, may be a bit confusing
> >>>> during
> >>>> >> >> >> failovers.
> >>>> >> >> >>
> >>>> >> >> >> best,
> >>>> >> >> >> Colin
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >> On Tue, Sep 12, 2023, at 14:25, Colin McCabe wrote:
> >>>> >> >> >> > Hi Calvin,
> >>>> >> >> >> >
> >>>> >> >> >> > Thanks for the KIP. I think this is a great improvement.
> >>>> >> >> >> >
> >>>> >> >> >> >> Additional High Watermark advance requirement
> >>>> >> >> >> >
> >>>> >> >> >> > Typo: change "advance" to "advancement"
> >>>> >> >> >> >
> >>>> >> >> >> >> A bit recap of some key concepts.
> >>>> >> >> >> >
> >>>> >> >> >> > Typo: change "bit" to "quick"
> >>>> >> >> >> >
> >>>> >> >> >> >> Ack=1/all produce request. It defines when the Kafka
> server
> >>>> should
> >>>> >> >> >> respond to the produce request
> >>>> >> >> >> >
> >>>> >> >> >> > I think this section would be clearer if we talked about
> the
> >>>> new
> >>>> >> high
> >>>> >> >> >> > watermark advancement requirement first, and THEN talked
> >>>> about its
> >>>> >> >> >> > impact on acks=0, acks=1, and     acks=all.  acks=all is of
> >>>> course
> >>>> >> the
> >>>> >> >> >> > main case we care about here, so it would be good to lead
> with
> >>>> >> that,
> >>>> >> >> >> > rather than delving into the technicalities of acks=0/1
> first.
> >>>> >> >> >> >
> >>>> >> >> >> >> Unclean recovery
> >>>> >> >> >> >
> >>>> >> >> >> > So, here you are introducing a new configuration,
> >>>> >> >> >> > unclean.recovery.strategy. The difficult thing here is that
> >>>> there
> >>>> >> is a
> >>>> >> >> >> > lot of overlap with unclean.leader.election.enable. So we
> >>>> have 3
> >>>> >> >> >> > different settings for unclean.recovery.strategy, plus 2
> >>>> different
> >>>> >> >> >> > settings for unclean.leader.election.enable, giving a cross
> >>>> >> product of
> >>>> >> >> >> > 6 different options. The following "unclean recovery
> manager"
> >>>> >> section
> >>>> >> >> >> > only applies to one fo those 6 different possibilities (I
> >>>> think?)
> >>>> >> >> >> >
> >>>> >> >> >> > I simply don't think we need so many different election
> types.
> >>>> >> Really
> >>>> >> >> >> > the use-cases we need are people who want NO unclean
> >>>> elections,
> >>>> >> people
> >>>> >> >> >> > who want "the reasonable thing" and people who want
> >>>> avaialbility at
> >>>> >> >> all
> >>>> >> >> >> > costs.
> >>>> >> >> >> >
> >>>> >> >> >> > Overall, I feel like the first half of the KIP is about the
> >>>> ELR,
> >>>> >> and
> >>>> >> >> >> > the second half is about reworking unclean leader
> election. It
> >>>> >> might
> >>>> >> >> be
> >>>> >> >> >> > better to move that second half to a separate KIP so that
> we
> >>>> can
> >>>> >> >> figure
> >>>> >> >> >> > it out fully. It should be fine to punt this until later
> and
> >>>> just
> >>>> >> have
> >>>> >> >> >> > the current behavior on empty ELR be waiting for the last
> >>>> known
> >>>> >> leader
> >>>> >> >> >> > to return. After all, that's what we do today.
> >>>> >> >> >> >
> >>>> >> >> >> >> DescribeTopicRequest
> >>>> >> >> >> >
> >>>> >> >> >> > Is the intention for AdminClient to use this RPC for
> >>>> >> >> >> > Admin#describeTopics ? If so, we need to describe all of
> the
> >>>> >> changes
> >>>> >> >> to
> >>>> >> >> >> > the admin client API, as well as changes to command-line
> >>>> tools like
> >>>> >> >> >> > kafka-topics.sh (if there are any). For example, you will
> >>>> probably
> >>>> >> >> need
> >>>> >> >> >> > changes to TopicDescription.java. You will also need to
> >>>> provide
> >>>> >> all of
> >>>> >> >> >> > the things that admin client needs -- for example,
> >>>> >> >> >> > TopicAuthorizedOperations.
> >>>> >> >> >> >
> >>>> >> >> >> > I also don't think the controller should serve this
> request.
> >>>> We
> >>>> >> want
> >>>> >> >> to
> >>>> >> >> >> > minimize load on the controller. Just like with the other
> >>>> metadata
> >>>> >> >> >> > requests like MetadataRequest, this should be served by
> >>>> brokers.
> >>>> >> >> >> >
> >>>> >> >> >> > It's a bit confusing why both topic ID and topic name are
> >>>> provided
> >>>> >> to
> >>>> >> >> >> > this API. Is the intention that callers should set one but
> >>>> not the
> >>>> >> >> >> > other? Or both? This needs to be clarified. Also, if we do
> >>>> want to
> >>>> >> >> >> > support lookups by UUID, that is another thing that needs
> to
> >>>> be
> >>>> >> added
> >>>> >> >> >> > to adminclient.
> >>>> >> >> >> >
> >>>> >> >> >> > In general, I feel like this should also probably be its
> own
> >>>> KIP
> >>>> >> since
> >>>> >> >> >> > it's fairly complex
> >>>> >> >> >> >
> >>>> >> >> >> > best,
> >>>> >> >> >> > Colin
> >>>> >> >> >> >
> >>>> >> >> >> >
> >>>> >> >> >> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu wrote:
> >>>> >> >> >> >> Hi everyone,
> >>>> >> >> >> >> I'd like to discuss a series of enhancement to the
> >>>> replication
> >>>> >> >> protocol.
> >>>> >> >> >> >>
> >>>> >> >> >> >> A partition replica can experience local data loss in
> unclean
> >>>> >> >> shutdown
> >>>> >> >> >> >> scenarios where unflushed data in the OS page cache is
> lost
> >>>> - such
> >>>> >> >> as an
> >>>> >> >> >> >> availability zone power outage or a server error. The
> Kafka
> >>>> >> >> replication
> >>>> >> >> >> >> protocol is designed to handle these situations by
> removing
> >>>> such
> >>>> >> >> >> replicas
> >>>> >> >> >> >> from the ISR and only re-adding them once they have caught
> >>>> up and
> >>>> >> >> >> therefore
> >>>> >> >> >> >> recovered any lost data. This prevents replicas that lost
> an
> >>>> >> >> arbitrary
> >>>> >> >> >> log
> >>>> >> >> >> >> suffix, which included committed data, from being elected
> >>>> leader.
> >>>> >> >> >> >> However, there is a "last replica standing" state which
> when
> >>>> >> combined
> >>>> >> >> >> with
> >>>> >> >> >> >> a data loss unclean shutdown event can turn a local data
> loss
> >>>> >> >> scenario
> >>>> >> >> >> into
> >>>> >> >> >> >> a global data loss scenario, i.e., committed data can be
> >>>> removed
> >>>> >> from
> >>>> >> >> >> all
> >>>> >> >> >> >> replicas. When the last replica in the ISR experiences an
> >>>> unclean
> >>>> >> >> >> shutdown
> >>>> >> >> >> >> and loses committed data, it will be reelected leader
> after
> >>>> >> starting
> >>>> >> >> up
> >>>> >> >> >> >> again, causing rejoining followers to truncate their logs
> and
> >>>> >> thereby
> >>>> >> >> >> >> removing the last copies of the committed records which
> the
> >>>> leader
> >>>> >> >> lost
> >>>> >> >> >> >> initially.
> >>>> >> >> >> >>
> >>>> >> >> >> >> The new KIP will maximize the protection and provides
> >>>> MinISR-1
> >>>> >> >> >> tolerance to
> >>>> >> >> >> >> data loss unclean shutdown events.
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >>
> >>>> >> >>
> >>>> >>
> >>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
> >>>> >> >> >>
> >>>> >> >>
> >>>> >>
> >>>>
> >>>
>

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Reply via email to