Catching up with this discussion.

I was just curious -- have we had other instances where downgrading MV is
not supported? I think Kafka typically tries to support downgrades, and I
couldn't think of other examples.

Thanks,
Justine

On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu <ca...@confluent.io.invalid>
wrote:

> Hi Jun,
> 54. Marked the software downgrading is not supported. As the old controller
> will not understand the new PartitionRecord and PartitionChangeRecord.
> Thanks!
>
> On Wed, Oct 4, 2023 at 9:12 AM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, Calvin,
> >
> > Thanks for the reply. Just one more comment.
> >
> > 54. It seems that downgrading MV is supported. Is downgrading the
> software
> > version supported? It would be useful to document that.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits
> > <alivsh...@confluent.io.invalid> wrote:
> >
> > > Hi Colin,
> > >
> > > I think in your example "do_unclean_recovery" would need to do
> different
> > > things depending on the strategy.
> > >
> > > do_unclean_recovery() {
> > >    if (unclean.recovery.manager.enabled) {
> > >     if (strategy == Aggressive)
> > >       use UncleanRecoveryManager(waitLastKnownERL=false)  // just
> inspect
> > > logs from whoever is available
> > >     else
> > >       use  UncleanRecoveryManager(waitLastKnownERL=true)  // must wait
> > for
> > > at least last known ELR
> > >   } else {
> > >     if (strategy == Aggressive)
> > >       choose the last known leader if that is available, or a random
> > leader
> > > if not)
> > >     else
> > >       wait for last known leader to get back
> > >   }
> > > }
> > >
> > > The idea is that the Aggressive strategy would kick in as soon as we
> lost
> > > the leader and would pick a leader from whoever is available; but the
> > > Balanced will only kick in when ELR is empty and will wait for the
> > brokers
> > > that likely have most data to be available.
> > >
> > > On Tue, Oct 3, 2023 at 3:04 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote:
> > > > > Hi, Calvin,
> > > > >
> > > > > Thanks for the update KIP. A few more comments.
> > > > >
> > > > > 41. Why would a user choose the option to select a random replica
> as
> > > the
> > > > > leader instead of using unclean.recovery.strateg=Aggressive? It
> seems
> > > > that
> > > > > the latter is strictly better? If that's not the case, could we
> fold
> > > this
> > > > > option under unclean.recovery.strategy instead of introducing a
> > > separate
> > > > > config?
> > > >
> > > > Hi Jun,
> > > >
> > > > I thought the flow of control was:
> > > >
> > > > If there is no leader for the partition {
> > > >   If (there are unfenced ELR members) {
> > > >     choose_an_unfenced_ELR_member
> > > >   } else if (there are fenced ELR members AND strategy=Aggressive) {
> > > >     do_unclean_recovery
> > > >   } else if (there are no ELR members AND strategy != None) {
> > > >     do_unclean_recovery
> > > >   } else {
> > > >     do nothing about the missing leader
> > > >   }
> > > > }
> > > >
> > > > do_unclean_recovery() {
> > > >    if (unclean.recovery.manager.enabled) {
> > > >     use UncleanRecoveryManager
> > > >   } else {
> > > >     choose the last known leader if that is available, or a random
> > leader
> > > > if not)
> > > >   }
> > > > }
> > > >
> > > > However, I think this could be clarified, especially the behavior
> when
> > > > unclean.recovery.manager.enabled=false. Inuitively the goal for
> > > > unclean.recovery.manager.enabled=false is to be "the same as now,
> > mostly"
> > > > but it's very underspecified in the KIP, I agree.
> > > >
> > > > >
> > > > > 50. ElectLeadersRequest: "If more than 20 topics are included, only
> > the
> > > > > first 20 will be served. Others will be returned with
> > DesiredLeaders."
> > > > Hmm,
> > > > > not sure that I understand this. ElectLeadersResponse doesn't have
> a
> > > > > DesiredLeaders field.
> > > > >
> > > > > 51. GetReplicaLogInfo: "If more than 2000 partitions are included,
> > only
> > > > the
> > > > > first 2000 will be served" Do we return an error for the remaining
> > > > > partitions? Actually, should we include an errorCode field at the
> > > > partition
> > > > > level in GetReplicaLogInfoResponse to cover non-existing partitions
> > and
> > > > no
> > > > > authorization, etc?
> > > > >
> > > > > 52. The entry should matches => The entry should match
> > > > >
> > > > > 53. ElectLeadersRequest.DesiredLeaders: Should it be nullable
> since a
> > > > user
> > > > > may not specify DesiredLeaders?
> > > > >
> > > > > 54. Downgrade: Is that indeed possible? I thought earlier you said
> > that
> > > > > once the new version of the records are in the metadata log, one
> > can't
> > > > > downgrade since the old broker doesn't know how to parse the new
> > > version
> > > > of
> > > > > the metadata records?
> > > > >
> > > >
> > > > MetadataVersion downgrade is currently broken but we have fixing it
> on
> > > our
> > > > plate for Kafka 3.7.
> > > >
> > > > The way downgrade works is that "new features" are dropped, leaving
> > only
> > > > the old ones.
> > > >
> > > > > 55. CleanShutdownFile: Should we add a version field for future
> > > > extension?
> > > > >
> > > > > 56. Config changes are public facing. Could we have a separate
> > section
> > > to
> > > > > document all the config changes?
> > > >
> > > > +1. A separate section for this would be good.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Sep 25, 2023 at 4:29 PM Calvin Liu
> > <ca...@confluent.io.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Jun
> > > > >> Thanks for the comments.
> > > > >>
> > > > >> 40. If we change to None, it is not guaranteed for no data loss.
> For
> > > > users
> > > > >> who are not able to validate the data with external resources,
> > manual
> > > > >> intervention does not give a better result but a loss of
> > availability.
> > > > So
> > > > >> practically speaking, the Balance mode would be a better default
> > > value.
> > > > >>
> > > > >> 41. No, it represents how we want to do the unclean leader
> election.
> > > If
> > > > it
> > > > >> is false, the unclean leader election will be the old random way.
> > > > >> Otherwise, the unclean recovery will be used.
> > > > >>
> > > > >> 42. Good catch. Updated.
> > > > >>
> > > > >> 43. Only the first 20 topics will be served. Others will be
> returned
> > > > with
> > > > >> InvalidRequestError
> > > > >>
> > > > >> 44. The order matters. The desired leader entries match with the
> > topic
> > > > >> partition list by the index.
> > > > >>
> > > > >> 45. Thanks! Updated.
> > > > >>
> > > > >> 46. Good advice! Updated.
> > > > >>
> > > > >> 47.1, updated the comment. Basically it will elect the replica in
> > the
> > > > >> desiredLeader field to be the leader
> > > > >>
> > > > >> 47.2 We can let the admin client do the conversion. Using the
> > > > desiredLeader
> > > > >> field in the json format seems easier for users.
> > > > >>
> > > > >> 48. Once the MV version is downgraded, all the ELR related fields
> > will
> > > > be
> > > > >> removed on the next partition change. The controller will also
> > ignore
> > > > the
> > > > >> ELR fields. Updated the KIP.
> > > > >>
> > > > >> 49. Yes, it would be deprecated/removed.
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 25, 2023 at 3:49 PM Jun Rao <j...@confluent.io.invalid
> >
> > > > wrote:
> > > > >>
> > > > >> > Hi, Calvin,
> > > > >> >
> > > > >> > Thanks for the updated KIP. Made another pass. A few more
> comments
> > > > below.
> > > > >> >
> > > > >> > 40. unclean.leader.election.enable.false ->
> > > > >> > unclean.recovery.strategy.Balanced: The Balanced mode could
> still
> > > > lead to
> > > > >> > data loss. So, I am wondering if
> > > unclean.leader.election.enable.false
> > > > >> > should map to None?
> > > > >> >
> > > > >> > 41. unclean.recovery.manager.enabled: I am not sure why we
> > introduce
> > > > this
> > > > >> > additional config. Is it the same as
> > unclean.recovery.strategy=None?
> > > > >> >
> > > > >> > 42. DescribeTopicResponse.TopicAuthorizedOperations: Should this
> > be
> > > at
> > > > >> the
> > > > >> > topic level?
> > > > >> >
> > > > >> > 43. "Limit: 20 topics max per request": Could we describe what
> > > > happens if
> > > > >> > the request includes more than 20 topics?
> > > > >> >
> > > > >> > 44. ElectLeadersRequest.DesiredLeaders: Could we describe
> whether
> > > the
> > > > >> > ordering matters?
> > > > >> >
> > > > >> > 45. GetReplicaLogInfo.TopicPartitions: "about": "The topic
> > > partitions
> > > > to
> > > > >> > elect leaders.": The description in "about" is incorrect.
> > > > >> >
> > > > >> > 46. GetReplicaLogInfoResponse: Should we nest partitions under
> > > > topicId to
> > > > >> > be consistent with other types of responses?
> > > > >> >
> > > > >> > 47. kafka-leader-election.sh:
> > > > >> > 47.1 Could we explain DESIGNATION?
> > > > >> > 47.2 desiredLeader: Should it be a list to match the field in
> > > > >> > ElectLeadersRequest?
> > > > >> >
> > > > >> > 48. We could add a section on downgrade?
> > > > >> >
> > > > >> > 49. LastKnownLeader: This seems only needed in the first phase
> of
> > > > >> > delivering ELR. Will it be removed when the complete KIP is
> > > delivered?
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Jun
> > > > >> >
> > > > >> > On Tue, Sep 19, 2023 at 1:30 PM Colin McCabe <
> cmcc...@apache.org>
> > > > wrote:
> > > > >> >
> > > > >> > > Hi Calvin,
> > > > >> > >
> > > > >> > > Thanks for the explanations. I like the idea of using none,
> > > > balanced,
> > > > >> > > aggressive. We also had an offline discussion about why it is
> > good
> > > > to
> > > > >> > use a
> > > > >> > > new config key (basically, so that we can deprecate the old
> one
> > > > which
> > > > >> had
> > > > >> > > only false/true values in 4.0) With these changes, I am +1.
> > > > >> > >
> > > > >> > > best,
> > > > >> > > Colin
> > > > >> > >
> > > > >> > > On Mon, Sep 18, 2023, at 15:54, Calvin Liu wrote:
> > > > >> > > > Hi Colin,
> > > > >> > > > Also, can we deprecate unclean.leader.election.enable in
> 4.0?
> > > > Before
> > > > >> > > that,
> > > > >> > > > we can have both the config unclean.recovery.strategy and
> > > > >> > > > unclean.leader.election.enable
> > > > >> > > > and using the unclean.recovery.Enabled to determine which
> > config
> > > > to
> > > > >> use
> > > > >> > > > during the unclean leader election.
> > > > >> > > >
> > > > >> > > > On Mon, Sep 18, 2023 at 3:51 PM Calvin Liu <
> > ca...@confluent.io>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > >> Hi Colin,
> > > > >> > > >> For the unclean.recovery.strategy config name, how about we
> > use
> > > > the
> > > > >> > > >> following
> > > > >> > > >> None. It basically means no unclean recovery will be
> > performed.
> > > > >> > > >> Aggressive. It means availability goes first. Whenever the
> > > > partition
> > > > >> > > can't
> > > > >> > > >> elect a durable replica, the controller will try the
> unclean
> > > > >> recovery.
> > > > >> > > >> Balanced. It is the balance point of the availability
> > > > >> > first(Aggressive)
> > > > >> > > >> and least availability(None). The controller performs
> unclean
> > > > >> recovery
> > > > >> > > when
> > > > >> > > >> both ISR and ELR are empty.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> On Fri, Sep 15, 2023 at 11:42 AM Calvin Liu <
> > > ca...@confluent.io>
> > > > >> > wrote:
> > > > >> > > >>
> > > > >> > > >>> Hi Colin,
> > > > >> > > >>>
> > > > >> > > >>> > So, the proposal is that if someone sets
> > > > >> > > "unclean.leader.election.enable
> > > > >> > > >>> = true"...
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> The idea is to use one of the
> unclean.leader.election.enable
> > > and
> > > > >> > > >>> unclean.recovery.strategy based on the
> > > > unclean.recovery.Enabled. A
> > > > >> > > possible
> > > > >> > > >>> version can be
> > > > >> > > >>>
> > > > >> > > >>> If unclean.recovery.Enabled:
> > > > >> > > >>>
> > > > >> > > >>> {
> > > > >> > > >>>
> > > > >> > > >>> Check unclean.recovery.strategy. If set, use it.
> Otherwise,
> > > > check
> > > > >> > > >>> unclean.leader.election.enable and translate it to
> > > > >> > > >>> unclean.recovery.strategy.
> > > > >> > > >>>
> > > > >> > > >>> } else {
> > > > >> > > >>>
> > > > >> > > >>> Use unclean.leader.election.enable
> > > > >> > > >>>
> > > > >> > > >>> }
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> —--------
> > > > >> > > >>>
> > > > >> > > >>> >The configuration key should be
> > > > >> "unclean.recovery.manager.enabled",
> > > > >> > > >>> right?
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> I think we have two ways of choosing a leader uncleanly,
> > > unclean
> > > > >> > leader
> > > > >> > > >>> election and unclean recovery(log inspection) and we try
> to
> > > > switch
> > > > >> > > between
> > > > >> > > >>> them.
> > > > >> > > >>>
> > > > >> > > >>> Do you mean we want to develop two ways of performing the
> > > > unclean
> > > > >> > > >>> recovery and one of them is using “unclean recovery
> > manager”?
> > > I
> > > > >> guess
> > > > >> > > we
> > > > >> > > >>> haven’t discussed the second way.
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> —-------
> > > > >> > > >>>
> > > > >> > > >>> >How do these 4 levels of overrides interact with your new
> > > > >> > > >>> configurations?
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> I do notice in the Kraft controller code, the method to
> > check
> > > > >> whether
> > > > >> > > >>> perform unclean leader election is hard coded to false
> since
> > > > >> > > >>> 2021(uncleanLeaderElectionEnabledForTopic). Isn’t it a
> good
> > > > chance
> > > > >> to
> > > > >> > > >>> completely deprecate the unclean.leader.election.enable?
> We
> > > > don’t
> > > > >> > even
> > > > >> > > have
> > > > >> > > >>> to worry about the config conversion.
> > > > >> > > >>>
> > > > >> > > >>> On the other hand, whatever the override is, as long as
> the
> > > > >> > controller
> > > > >> > > >>> can have the final effective
> unclean.leader.election.enable,
> > > the
> > > > >> > topic
> > > > >> > > >>> level config unclean.recovery.strategy, the cluster level
> > > config
> > > > >> > > >>> unclean.recovery.Enabled, the controller can calculate the
> > > > correct
> > > > >> > > methods
> > > > >> > > >>> to use right?
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> On Fri, Sep 15, 2023 at 10:02 AM Colin McCabe <
> > > > cmcc...@apache.org>
> > > > >> > > wrote:
> > > > >> > > >>>
> > > > >> > > >>>> On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote:
> > > > >> > > >>>> > Hi Colin
> > > > >> > > >>>> > 1. I think using the new config name is more clear.
> > > > >> > > >>>> >        a. The unclean leader election is actually
> removed
> > > if
> > > > >> > unclean
> > > > >> > > >>>> > recovery is in use.
> > > > >> > > >>>> >        b. Using multiple values in
> > > > >> unclean.leader.election.enable
> > > > >> > is
> > > > >> > > >>>> > confusing and it will be more confusing after people
> > forget
> > > > >> about
> > > > >> > > this
> > > > >> > > >>>> > discussion.
> > > > >> > > >>>>
> > > > >> > > >>>> Hi Calvin,
> > > > >> > > >>>>
> > > > >> > > >>>> So, the proposal is that if someone sets
> > > > >> > > "unclean.leader.election.enable
> > > > >> > > >>>> = true" but then sets one of your new configurations, the
> > > > value of
> > > > >> > > >>>> unclean.leader.election.enable is ignored? That seems
> less
> > > > clear
> > > > >> to
> > > > >> > > me, not
> > > > >> > > >>>> more. Just in general, having multiple configuration keys
> > to
> > > > >> control
> > > > >> > > the
> > > > >> > > >>>> same thing confuses users. Basically, they are sitting
> at a
> > > > giant
> > > > >> > > control
> > > > >> > > >>>> panel, and some of the levers do nothing.
> > > > >> > > >>>>
> > > > >> > > >>>> > 2. Sorry I forgot to mention in the response that I did
> > add
> > > > the
> > > > >> > > >>>> > unclean.recovery.Enabled flag.
> > > > >> > > >>>>
> > > > >> > > >>>> The configuration key should be
> > > > >> "unclean.recovery.manager.enabled",
> > > > >> > > >>>> right? Becuase we can do "unclean recovery" without the
> > > > manager.
> > > > >> > > Disabling
> > > > >> > > >>>> the manager just means we use a different mechanism for
> > > > recovery.
> > > > >> > > >>>>
> > > > >> > > >>>> >        c. Maybe I underestimated the challenge of
> > replacing
> > > > the
> > > > >> > > >>>> config. Any
> > > > >> > > >>>> > implementation problems ahead?
> > > > >> > > >>>>
> > > > >> > > >>>> There are four levels of overrides for
> > > > >> > unclean.leader.election.enable.
> > > > >> > > >>>>
> > > > >> > > >>>> 1. static configuration for node.
> > > > >> > > >>>>     This goes in the configuration file, typically named
> > > > >> > > >>>> server.properties
> > > > >> > > >>>>
> > > > >> > > >>>> 2. dynamic configuration for node default
> > > > >> > > >>>>   ConfigResource(type=BROKER, name="")
> > > > >> > > >>>>
> > > > >> > > >>>> 3. dynamic configuration for node
> > > > >> > > >>>>   ConfigResource(type=BROKER, name=<controller id>)
> > > > >> > > >>>>
> > > > >> > > >>>> 4. dynamic configuration for topic
> > > > >> > > >>>>   ConfigResource(type=TOPIC, name=<topic-name>)
> > > > >> > > >>>>
> > > > >> > > >>>> How do these 4 levels of overrides interact with your new
> > > > >> > > >>>> configurations? If the new configurations dominate over
> the
> > > old
> > > > >> > ones,
> > > > >> > > it
> > > > >> > > >>>> seems like this will get a lot more confusing to
> implement
> > > (and
> > > > >> also
> > > > >> > > to
> > > > >> > > >>>> use.)
> > > > >> > > >>>>
> > > > >> > > >>>> Again, I'd recommend just adding some new values to
> > > > >> > > >>>> unclean.leader.election.enable. It's simple and will
> > prevent
> > > > user
> > > > >> > > confusion
> > > > >> > > >>>> (as well as developer confusion.)
> > > > >> > > >>>>
> > > > >> > > >>>> best,
> > > > >> > > >>>> Colin
> > > > >> > > >>>>
> > > > >> > > >>>>
> > > > >> > > >>>> > 3. About the admin client, I mentioned 3 changes in the
> > > > client.
> > > > >> > > >>>> Anything
> > > > >> > > >>>> > else I missed in the KIP?
> > > > >> > > >>>> >       a. The client will switch to using the new RPC
> > > instead
> > > > of
> > > > >> > > >>>> > MetadataRequest for the topics.
> > > > >> > > >>>> >       b. The TopicPartitionInfo used in
> TopicDescription
> > > > needs
> > > > >> to
> > > > >> > > add
> > > > >> > > >>>> new
> > > > >> > > >>>> > fields related to the ELR.
> > > > >> > > >>>> >       c. The outputs will add the ELR related fields.
> > > > >> > > >>>> >
> > > > >> > > >>>> > On Thu, Sep 14, 2023 at 9:19 PM Colin McCabe <
> > > > >> cmcc...@apache.org>
> > > > >> > > >>>> wrote:
> > > > >> > > >>>> >
> > > > >> > > >>>> >> Hi Calvin,
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> Thanks for the changes.
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> 1. Earlier I commented that creating
> > > > >> "unclean.recovery.strategy "
> > > > >> > > is
> > > > >> > > >>>> not
> > > > >> > > >>>> >> necessary, and we can just reuse the existing
> > > > >> > > >>>> >> "unclean.leader.election.enable" configuration key.
> > Let's
> > > > >> discuss
> > > > >> > > >>>> that.
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> 2.I also don't understand why you didn't add a
> > > > configuration to
> > > > >> > > >>>> enable or
> > > > >> > > >>>> >> disable the Unclean Recovery Manager. This seems like
> a
> > > very
> > > > >> > simple
> > > > >> > > >>>> way to
> > > > >> > > >>>> >> handle the staging issue which we discussed. The URM
> can
> > > > just
> > > > >> be
> > > > >> > > >>>> turned off
> > > > >> > > >>>> >> until it is production ready. Let's discuss this.
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> 3. You still need to describe the changes to
> AdminClient
> > > > that
> > > > >> are
> > > > >> > > >>>> needed
> > > > >> > > >>>> >> to use DescribeTopicRequest.
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> Keep at it. It's looking better. :)
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> best,
> > > > >> > > >>>> >> Colin
> > > > >> > > >>>> >>
> > > > >> > > >>>> >>
> > > > >> > > >>>> >> On Thu, Sep 14, 2023, at 11:03, Calvin Liu wrote:
> > > > >> > > >>>> >> > Hi Colin
> > > > >> > > >>>> >> > Thanks for the comments!
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> > I did the following changes
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    1.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    Simplified the API spec section to only include
> the
> > > > diff.
> > > > >> > > >>>> >> >    2.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    Reordered the HWM requirement section.
> > > > >> > > >>>> >> >    3.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    Removed the URM implementation details to keep
> the
> > > > >> necessary
> > > > >> > > >>>> >> >    characteristics to perform the unclean recovery.
> > > > >> > > >>>> >> >    1.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >       When to perform the unclean recovery
> > > > >> > > >>>> >> >       2.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >       Under different config, how the unclean
> recovery
> > > > finds
> > > > >> > the
> > > > >> > > >>>> leader.
> > > > >> > > >>>> >> >       3.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >       How the config unclean.leader.election.enable
> > and
> > > > >> > > >>>> >> >       unclean.recovery.strategy are converted when
> > users
> > > > >> > > >>>> enable/disable
> > > > >> > > >>>> >> the
> > > > >> > > >>>> >> >       unclean recovery.
> > > > >> > > >>>> >> >       4.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    More details about how we change admin client.
> > > > >> > > >>>> >> >    5.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    API limits on the GetReplicaLogInfoRequest and
> > > > >> > > >>>> DescribeTopicRequest.
> > > > >> > > >>>> >> >    6.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >    Two metrics added
> > > > >> > > >>>> >> >    1.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >
> > > >  Kafka.controller.global_under_min_isr_partition_count
> > > > >> > > >>>> >> >       2.
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >
>  kafka.controller.unclean_recovery_finished_count
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> > On Wed, Sep 13, 2023 at 10:46 AM Colin McCabe <
> > > > >> > > cmcc...@apache.org>
> > > > >> > > >>>> >> wrote:
> > > > >> > > >>>> >> >
> > > > >> > > >>>> >> >> On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote:
> > > > >> > > >>>> >> >> > Hi Colin
> > > > >> > > >>>> >> >> > Thanks for the comments!
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Hi Calvin,
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Thanks again for the KIP.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> One meta-comment: it's usually better to just do a
> > diff
> > > > on a
> > > > >> > > >>>> message
> > > > >> > > >>>> >> spec
> > > > >> > > >>>> >> >> file or java file if you're including changes to it
> > in
> > > > the
> > > > >> > KIP.
> > > > >> > > >>>> This is
> > > > >> > > >>>> >> >> easier to read than looking for "new fields begin"
> > etc.
> > > > in
> > > > >> the
> > > > >> > > >>>> text, and
> > > > >> > > >>>> >> >> gracefully handles the case where existing fields
> > were
> > > > >> > changed.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> > Rewrite the Additional High Watermark advancement
> > > > >> > requirement
> > > > >> > > >>>> >> >> > There was feedback on this section that some
> > readers
> > > > may
> > > > >> not
> > > > >> > > be
> > > > >> > > >>>> >> familiar
> > > > >> > > >>>> >> >> > with HWM and Ack=0,1,all requests. This can help
> > them
> > > > >> > > understand
> > > > >> > > >>>> the
> > > > >> > > >>>> >> >> > proposal. I will rewrite this part for more
> > > > readability.
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> To be clear, I wasn't suggesting dropping either
> > > > section. I
> > > > >> > > agree
> > > > >> > > >>>> that
> > > > >> > > >>>> >> >> they add useful background. I was just suggesting
> > that
> > > we
> > > > >> > should
> > > > >> > > >>>> discuss
> > > > >> > > >>>> >> >> the "acks" setting AFTER discussing the new high
> > > > watermark
> > > > >> > > >>>> advancement
> > > > >> > > >>>> >> >> conditions. We also should discuss acks=0. While it
> > > isn't
> > > > >> > > >>>> conceptually
> > > > >> > > >>>> >> much
> > > > >> > > >>>> >> >> different than acks=1 here, its omission from this
> > > > section
> > > > >> is
> > > > >> > > >>>> confusing.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> > Unclean recovery
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > The plan is to replace the
> > > > unclean.leader.election.enable
> > > > >> > with
> > > > >> > > >>>> >> >> > unclean.recovery.strategy. If the Unclean
> Recovery
> > is
> > > > >> > enabled
> > > > >> > > >>>> then it
> > > > >> > > >>>> >> >> deals
> > > > >> > > >>>> >> >> > with the three options in the
> > > > unclean.recovery.strategy.
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > Let’s refine the Unclean Recovery. We have
> already
> > > > taken a
> > > > >> > > lot of
> > > > >> > > >>>> >> >> > suggestions and I hope to enhance the durability
> of
> > > > Kafka
> > > > >> to
> > > > >> > > the
> > > > >> > > >>>> next
> > > > >> > > >>>> >> >> level
> > > > >> > > >>>> >> >> > with this KIP.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> I am OK with doing the unclean leader recovery
> > > > improvements
> > > > >> in
> > > > >> > > >>>> this KIP.
> > > > >> > > >>>> >> >> However, I think we need to really work on the
> > > > configuration
> > > > >> > > >>>> settings.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Configuration overrides are often quite messy. For
> > > > example,
> > > > >> > the
> > > > >> > > >>>> cases
> > > > >> > > >>>> >> >> where we have log.roll.hours and
> log.roll.segment.ms
> > ,
> > > > the
> > > > >> > user
> > > > >> > > >>>> has to
> > > > >> > > >>>> >> >> remember which one takes precedence, and it is not
> > > > obvious.
> > > > >> > So,
> > > > >> > > >>>> rather
> > > > >> > > >>>> >> than
> > > > >> > > >>>> >> >> creating a new configuration, why not add
> additional
> > > > values
> > > > >> to
> > > > >> > > >>>> >> >> "unclean.leader.election.enable"? I think this will
> > be
> > > > >> simpler
> > > > >> > > for
> > > > >> > > >>>> >> people
> > > > >> > > >>>> >> >> to understand, and simpler in the code as well.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> What if we continued to use
> > > > "unclean.leader.election.enable"
> > > > >> > but
> > > > >> > > >>>> >> extended
> > > > >> > > >>>> >> >> it so that it took a string? Then the string could
> > have
> > > > >> these
> > > > >> > > >>>> values:
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> never
> > > > >> > > >>>> >> >>     never automatically do an unclean leader
> election
> > > > under
> > > > >> > any
> > > > >> > > >>>> >> conditions
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> false / default
> > > > >> > > >>>> >> >>     only do an unclean leader election if there may
> > be
> > > > >> > possible
> > > > >> > > >>>> data
> > > > >> > > >>>> >> loss
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> true / always
> > > > >> > > >>>> >> >>     always do an unclean leader election if we
> can't
> > > > >> > immediately
> > > > >> > > >>>> elect a
> > > > >> > > >>>> >> >> leader
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> It's a bit awkward that false maps to default
> rather
> > > > than to
> > > > >> > > >>>> never. But
> > > > >> > > >>>> >> >> this awkwardness exists if we use two different
> > > > >> configuration
> > > > >> > > keys
> > > > >> > > >>>> as
> > > > >> > > >>>> >> well.
> > > > >> > > >>>> >> >> The reason for the awkwardness is that we simply
> > don't
> > > > want
> > > > >> > most
> > > > >> > > >>>> of the
> > > > >> > > >>>> >> >> people currently setting
> > > > >> unclean.leader.election.enable=false
> > > > >> > to
> > > > >> > > >>>> get the
> > > > >> > > >>>> >> >> "never" behavior. We have to bite that bullet.
> Better
> > > to
> > > > be
> > > > >> > > clear
> > > > >> > > >>>> and
> > > > >> > > >>>> >> >> explicit than hide it.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Another thing that's a bit awkward is having two
> > > > different
> > > > >> > ways
> > > > >> > > to
> > > > >> > > >>>> do
> > > > >> > > >>>> >> >> unclean leader election specified in the KIP. You
> > > > descirbe
> > > > >> two
> > > > >> > > >>>> methods:
> > > > >> > > >>>> >> the
> > > > >> > > >>>> >> >> simple "choose the last leader" method, and the
> > > "unclean
> > > > >> > > recovery
> > > > >> > > >>>> >> manager"
> > > > >> > > >>>> >> >> method. I understand why you did it this way --
> > "choose
> > > > the
> > > > >> > last
> > > > >> > > >>>> >> leader" is
> > > > >> > > >>>> >> >> simple, and will help us deliver an implementation
> > > > quickly,
> > > > >> > > while
> > > > >> > > >>>> the
> > > > >> > > >>>> >> URM
> > > > >> > > >>>> >> >> is preferable in the long term. My suggestion here
> is
> > > to
> > > > >> > > separate
> > > > >> > > >>>> the
> > > > >> > > >>>> >> >> decision of HOW to do unclean leader election from
> > the
> > > > >> > decision
> > > > >> > > of
> > > > >> > > >>>> WHEN
> > > > >> > > >>>> >> to
> > > > >> > > >>>> >> >> do it.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> So in other words, have
> > > "unclean.leader.election.enable"
> > > > >> > specify
> > > > >> > > >>>> when we
> > > > >> > > >>>> >> >> do unclean leader election, and have a new
> > > configuration
> > > > >> like
> > > > >> > > >>>> >> >> "unclean.recovery.manager.enable" to determine if
> we
> > > use
> > > > the
> > > > >> > > URM.
> > > > >> > > >>>> >> >> Presumably the URM will take some time to get fully
> > > > stable,
> > > > >> so
> > > > >> > > >>>> this can
> > > > >> > > >>>> >> >> default to false for a while, and we can flip the
> > > > default to
> > > > >> > > true
> > > > >> > > >>>> when
> > > > >> > > >>>> >> we
> > > > >> > > >>>> >> >> feel ready.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> The URM is somewhat under-described here. I think
> we
> > > > need a
> > > > >> > few
> > > > >> > > >>>> >> >> configurations here for it. For example, we need a
> > > > >> > > configuration to
> > > > >> > > >>>> >> specify
> > > > >> > > >>>> >> >> how long it should wait for a broker to respond to
> > its
> > > > RPCs
> > > > >> > > before
> > > > >> > > >>>> >> moving
> > > > >> > > >>>> >> >> on. We also need to understand how the URM
> interacts
> > > with
> > > > >> > > >>>> >> >> unclean.leader.election.enable=always. I assume
> that
> > > with
> > > > >> > > "always"
> > > > >> > > >>>> we
> > > > >> > > >>>> >> will
> > > > >> > > >>>> >> >> just unconditionally use the URM rather than
> choosing
> > > > >> > randomly.
> > > > >> > > >>>> But this
> > > > >> > > >>>> >> >> should be spelled out in the KIP.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > DescribeTopicRequest
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> >    1.
> > > > >> > > >>>> >> >> >    Yes, the plan is to replace the
> MetadataRequest
> > > with
> > > > >> the
> > > > >> > > >>>> >> >> >    DescribeTopicRequest for the admin clients.
> Will
> > > > check
> > > > >> > the
> > > > >> > > >>>> details.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Sounds good. But as I said, you need to specify how
> > > > >> > AdminClient
> > > > >> > > >>>> >> interacts
> > > > >> > > >>>> >> >> with the new request. This will involve adding some
> > > > fields
> > > > >> to
> > > > >> > > >>>> >> >> TopicDescription.java. And you need to specify the
> > > > changes
> > > > >> to
> > > > >> > > the
> > > > >> > > >>>> >> >> kafka-topics.sh command line tool. Otherwise we
> > cannot
> > > > use
> > > > >> the
> > > > >> > > >>>> tool to
> > > > >> > > >>>> >> see
> > > > >> > > >>>> >> >> the new information.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> The new requests, DescribeTopicRequest and
> > > > >> > > >>>> GetReplicaLogInfoRequest,
> > > > >> > > >>>> >> need
> > > > >> > > >>>> >> >> to have limits placed on them so that their size
> > can't
> > > be
> > > > >> > > >>>> infinite. We
> > > > >> > > >>>> >> >> don't want to propagate the current problems of
> > > > >> > MetadataRequest,
> > > > >> > > >>>> where
> > > > >> > > >>>> >> >> clients can request massive responses that can mess
> > up
> > > > the
> > > > >> JVM
> > > > >> > > when
> > > > >> > > >>>> >> handled.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Adding limits is simple for
> GetReplicaLogInfoRequest
> > --
> > > > we
> > > > >> can
> > > > >> > > >>>> just say
> > > > >> > > >>>> >> >> that only 2000 partitions at a time can be
> requested.
> > > For
> > > > >> > > >>>> >> >> DescribeTopicRequest we can probably just limit to
> 20
> > > > topics
> > > > >> > or
> > > > >> > > >>>> >> something
> > > > >> > > >>>> >> >> like that, to avoid the complexity of doing
> > pagination
> > > in
> > > > >> this
> > > > >> > > KIP.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> >    2.
> > > > >> > > >>>> >> >> >    I can let the broker load the ELR info so that
> > > they
> > > > can
> > > > >> > > serve
> > > > >> > > >>>> the
> > > > >> > > >>>> >> >> >    DescribeTopicRequest as well.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> Yes, it's fine to add to MetadataCache. In fact,
> > you'll
> > > > be
> > > > >> > > loading
> > > > >> > > >>>> it
> > > > >> > > >>>> >> >> anyway once it's added to PartitionImage.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> >    3.
> > > > >> > > >>>> >> >> >    Yeah, it does not make sense to have the topic
> > id
> > > if
> > > > >> > > >>>> >> >> >    DescribeTopicRequest is only used by the admin
> > > > client.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> OK. That makes things simpler. We can always
> create a
> > > new
> > > > >> API
> > > > >> > > later
> > > > >> > > >>>> >> >> (hopefully not in this KIP!) to query by topic ID.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > Metrics
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > As for overall cluster health metrics, I think
> > > > >> under-min-ISR
> > > > >> > > is
> > > > >> > > >>>> still
> > > > >> > > >>>> >> a
> > > > >> > > >>>> >> >> > useful one. ELR is more like a safety belt. When
> > the
> > > > ELR
> > > > >> is
> > > > >> > > >>>> used, the
> > > > >> > > >>>> >> >> > cluster availability has already been impacted.
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > Maybe we can have a metric to count the
> partitions
> > > that
> > > > >> > > sum(ISR,
> > > > >> > > >>>> ELR)
> > > > >> > > >>>> >> <
> > > > >> > > >>>> >> >> min
> > > > >> > > >>>> >> >> > ISR. What do you think?
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> How about:
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> A.  a metric for the totoal number of under-min-isr
> > > > >> > partitions?
> > > > >> > > We
> > > > >> > > >>>> don't
> > > > >> > > >>>> >> >> have that in Apache Kafka at the moment.
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> B. a metric for the number of unclean leader
> > elections
> > > we
> > > > >> did
> > > > >> > > (for
> > > > >> > > >>>> >> >> simplicity, it can reset to 0 on controller
> restart:
> > we
> > > > >> expect
> > > > >> > > >>>> people to
> > > > >> > > >>>> >> >> monitor the change over time anyway)
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> best,
> > > > >> > > >>>> >> >> Colin
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > Yeah, for the ongoing unclean recoveries, the
> > > > controller
> > > > >> can
> > > > >> > > >>>> keep an
> > > > >> > > >>>> >> >> > accurate count through failover because partition
> > > > >> > registration
> > > > >> > > >>>> can
> > > > >> > > >>>> >> >> indicate
> > > > >> > > >>>> >> >> > whether a recovery is needed. However, for the
> > > happened
> > > > >> > ones,
> > > > >> > > >>>> unless
> > > > >> > > >>>> >> we
> > > > >> > > >>>> >> >> > want to persist the number somewhere, we can only
> > > > figure
> > > > >> it
> > > > >> > > out
> > > > >> > > >>>> from
> > > > >> > > >>>> >> the
> > > > >> > > >>>> >> >> > log.
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> > On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe <
> > > > >> > > cmcc...@apache.org
> > > > >> > > >>>> >
> > > > >> > > >>>> >> wrote:
> > > > >> > > >>>> >> >> >
> > > > >> > > >>>> >> >> >> Also, we should have metrics that show what is
> > going
> > > > on
> > > > >> > with
> > > > >> > > >>>> regard
> > > > >> > > >>>> >> to
> > > > >> > > >>>> >> >> the
> > > > >> > > >>>> >> >> >> eligible replica set. I'm not sure exactly what
> to
> > > > >> suggest,
> > > > >> > > but
> > > > >> > > >>>> >> >> something
> > > > >> > > >>>> >> >> >> that could identify when things are going wrong
> in
> > > the
> > > > >> > > clsuter.
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >> >> For example, maybe a metric for partitions
> > > containing
> > > > >> > > replicas
> > > > >> > > >>>> that
> > > > >> > > >>>> >> are
> > > > >> > > >>>> >> >> >> ineligible to be leader? That would show a spike
> > > when
> > > > a
> > > > >> > > broker
> > > > >> > > >>>> had an
> > > > >> > > >>>> >> >> >> unclean restart.
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >> >> Ideally, we'd also have a metric that indicates
> > when
> > > > an
> > > > >> > > unclear
> > > > >> > > >>>> >> leader
> > > > >> > > >>>> >> >> >> election or a recovery happened. It's a bit
> tricky
> > > > >> because
> > > > >> > > the
> > > > >> > > >>>> simple
> > > > >> > > >>>> >> >> >> thing, of tracking it per controller, may be a
> bit
> > > > >> > confusing
> > > > >> > > >>>> during
> > > > >> > > >>>> >> >> >> failovers.
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >> >> best,
> > > > >> > > >>>> >> >> >> Colin
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >> >> On Tue, Sep 12, 2023, at 14:25, Colin McCabe
> > wrote:
> > > > >> > > >>>> >> >> >> > Hi Calvin,
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > Thanks for the KIP. I think this is a great
> > > > >> improvement.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> >> Additional High Watermark advance requirement
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > Typo: change "advance" to "advancement"
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> >> A bit recap of some key concepts.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > Typo: change "bit" to "quick"
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> >> Ack=1/all produce request. It defines when
> the
> > > > Kafka
> > > > >> > > server
> > > > >> > > >>>> should
> > > > >> > > >>>> >> >> >> respond to the produce request
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > I think this section would be clearer if we
> > talked
> > > > >> about
> > > > >> > > the
> > > > >> > > >>>> new
> > > > >> > > >>>> >> high
> > > > >> > > >>>> >> >> >> > watermark advancement requirement first, and
> > THEN
> > > > >> talked
> > > > >> > > >>>> about its
> > > > >> > > >>>> >> >> >> > impact on acks=0, acks=1, and     acks=all.
> > > > acks=all
> > > > >> is
> > > > >> > of
> > > > >> > > >>>> course
> > > > >> > > >>>> >> the
> > > > >> > > >>>> >> >> >> > main case we care about here, so it would be
> > good
> > > to
> > > > >> lead
> > > > >> > > with
> > > > >> > > >>>> >> that,
> > > > >> > > >>>> >> >> >> > rather than delving into the technicalities of
> > > > acks=0/1
> > > > >> > > first.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> >> Unclean recovery
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > So, here you are introducing a new
> > configuration,
> > > > >> > > >>>> >> >> >> > unclean.recovery.strategy. The difficult thing
> > > here
> > > > is
> > > > >> > that
> > > > >> > > >>>> there
> > > > >> > > >>>> >> is a
> > > > >> > > >>>> >> >> >> > lot of overlap with
> > > unclean.leader.election.enable.
> > > > So
> > > > >> we
> > > > >> > > >>>> have 3
> > > > >> > > >>>> >> >> >> > different settings for
> > unclean.recovery.strategy,
> > > > plus
> > > > >> 2
> > > > >> > > >>>> different
> > > > >> > > >>>> >> >> >> > settings for unclean.leader.election.enable,
> > > giving
> > > > a
> > > > >> > cross
> > > > >> > > >>>> >> product of
> > > > >> > > >>>> >> >> >> > 6 different options. The following "unclean
> > > recovery
> > > > >> > > manager"
> > > > >> > > >>>> >> section
> > > > >> > > >>>> >> >> >> > only applies to one fo those 6 different
> > > > possibilities
> > > > >> (I
> > > > >> > > >>>> think?)
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > I simply don't think we need so many different
> > > > election
> > > > >> > > types.
> > > > >> > > >>>> >> Really
> > > > >> > > >>>> >> >> >> > the use-cases we need are people who want NO
> > > unclean
> > > > >> > > >>>> elections,
> > > > >> > > >>>> >> people
> > > > >> > > >>>> >> >> >> > who want "the reasonable thing" and people who
> > > want
> > > > >> > > >>>> avaialbility at
> > > > >> > > >>>> >> >> all
> > > > >> > > >>>> >> >> >> > costs.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > Overall, I feel like the first half of the KIP
> > is
> > > > about
> > > > >> > the
> > > > >> > > >>>> ELR,
> > > > >> > > >>>> >> and
> > > > >> > > >>>> >> >> >> > the second half is about reworking unclean
> > leader
> > > > >> > > election. It
> > > > >> > > >>>> >> might
> > > > >> > > >>>> >> >> be
> > > > >> > > >>>> >> >> >> > better to move that second half to a separate
> > KIP
> > > so
> > > > >> that
> > > > >> > > we
> > > > >> > > >>>> can
> > > > >> > > >>>> >> >> figure
> > > > >> > > >>>> >> >> >> > it out fully. It should be fine to punt this
> > until
> > > > >> later
> > > > >> > > and
> > > > >> > > >>>> just
> > > > >> > > >>>> >> have
> > > > >> > > >>>> >> >> >> > the current behavior on empty ELR be waiting
> for
> > > the
> > > > >> last
> > > > >> > > >>>> known
> > > > >> > > >>>> >> leader
> > > > >> > > >>>> >> >> >> > to return. After all, that's what we do today.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> >> DescribeTopicRequest
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > Is the intention for AdminClient to use this
> RPC
> > > for
> > > > >> > > >>>> >> >> >> > Admin#describeTopics ? If so, we need to
> > describe
> > > > all
> > > > >> of
> > > > >> > > the
> > > > >> > > >>>> >> changes
> > > > >> > > >>>> >> >> to
> > > > >> > > >>>> >> >> >> > the admin client API, as well as changes to
> > > > >> command-line
> > > > >> > > >>>> tools like
> > > > >> > > >>>> >> >> >> > kafka-topics.sh (if there are any). For
> example,
> > > you
> > > > >> will
> > > > >> > > >>>> probably
> > > > >> > > >>>> >> >> need
> > > > >> > > >>>> >> >> >> > changes to TopicDescription.java. You will
> also
> > > > need to
> > > > >> > > >>>> provide
> > > > >> > > >>>> >> all of
> > > > >> > > >>>> >> >> >> > the things that admin client needs -- for
> > example,
> > > > >> > > >>>> >> >> >> > TopicAuthorizedOperations.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > I also don't think the controller should serve
> > > this
> > > > >> > > request.
> > > > >> > > >>>> We
> > > > >> > > >>>> >> want
> > > > >> > > >>>> >> >> to
> > > > >> > > >>>> >> >> >> > minimize load on the controller. Just like
> with
> > > the
> > > > >> other
> > > > >> > > >>>> metadata
> > > > >> > > >>>> >> >> >> > requests like MetadataRequest, this should be
> > > > served by
> > > > >> > > >>>> brokers.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > It's a bit confusing why both topic ID and
> topic
> > > > name
> > > > >> are
> > > > >> > > >>>> provided
> > > > >> > > >>>> >> to
> > > > >> > > >>>> >> >> >> > this API. Is the intention that callers should
> > set
> > > > one
> > > > >> > but
> > > > >> > > >>>> not the
> > > > >> > > >>>> >> >> >> > other? Or both? This needs to be clarified.
> > Also,
> > > > if we
> > > > >> > do
> > > > >> > > >>>> want to
> > > > >> > > >>>> >> >> >> > support lookups by UUID, that is another thing
> > > that
> > > > >> needs
> > > > >> > > to
> > > > >> > > >>>> be
> > > > >> > > >>>> >> added
> > > > >> > > >>>> >> >> >> > to adminclient.
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > In general, I feel like this should also
> > probably
> > > be
> > > > >> its
> > > > >> > > own
> > > > >> > > >>>> KIP
> > > > >> > > >>>> >> since
> > > > >> > > >>>> >> >> >> > it's fairly complex
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > best,
> > > > >> > > >>>> >> >> >> > Colin
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> >
> > > > >> > > >>>> >> >> >> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu
> > wrote:
> > > > >> > > >>>> >> >> >> >> Hi everyone,
> > > > >> > > >>>> >> >> >> >> I'd like to discuss a series of enhancement
> to
> > > the
> > > > >> > > >>>> replication
> > > > >> > > >>>> >> >> protocol.
> > > > >> > > >>>> >> >> >> >>
> > > > >> > > >>>> >> >> >> >> A partition replica can experience local data
> > > loss
> > > > in
> > > > >> > > unclean
> > > > >> > > >>>> >> >> shutdown
> > > > >> > > >>>> >> >> >> >> scenarios where unflushed data in the OS page
> > > > cache is
> > > > >> > > lost
> > > > >> > > >>>> - such
> > > > >> > > >>>> >> >> as an
> > > > >> > > >>>> >> >> >> >> availability zone power outage or a server
> > error.
> > > > The
> > > > >> > > Kafka
> > > > >> > > >>>> >> >> replication
> > > > >> > > >>>> >> >> >> >> protocol is designed to handle these
> situations
> > > by
> > > > >> > > removing
> > > > >> > > >>>> such
> > > > >> > > >>>> >> >> >> replicas
> > > > >> > > >>>> >> >> >> >> from the ISR and only re-adding them once
> they
> > > have
> > > > >> > caught
> > > > >> > > >>>> up and
> > > > >> > > >>>> >> >> >> therefore
> > > > >> > > >>>> >> >> >> >> recovered any lost data. This prevents
> replicas
> > > > that
> > > > >> > lost
> > > > >> > > an
> > > > >> > > >>>> >> >> arbitrary
> > > > >> > > >>>> >> >> >> log
> > > > >> > > >>>> >> >> >> >> suffix, which included committed data, from
> > being
> > > > >> > elected
> > > > >> > > >>>> leader.
> > > > >> > > >>>> >> >> >> >> However, there is a "last replica standing"
> > state
> > > > >> which
> > > > >> > > when
> > > > >> > > >>>> >> combined
> > > > >> > > >>>> >> >> >> with
> > > > >> > > >>>> >> >> >> >> a data loss unclean shutdown event can turn a
> > > local
> > > > >> data
> > > > >> > > loss
> > > > >> > > >>>> >> >> scenario
> > > > >> > > >>>> >> >> >> into
> > > > >> > > >>>> >> >> >> >> a global data loss scenario, i.e., committed
> > data
> > > > can
> > > > >> be
> > > > >> > > >>>> removed
> > > > >> > > >>>> >> from
> > > > >> > > >>>> >> >> >> all
> > > > >> > > >>>> >> >> >> >> replicas. When the last replica in the ISR
> > > > experiences
> > > > >> > an
> > > > >> > > >>>> unclean
> > > > >> > > >>>> >> >> >> shutdown
> > > > >> > > >>>> >> >> >> >> and loses committed data, it will be
> reelected
> > > > leader
> > > > >> > > after
> > > > >> > > >>>> >> starting
> > > > >> > > >>>> >> >> up
> > > > >> > > >>>> >> >> >> >> again, causing rejoining followers to
> truncate
> > > > their
> > > > >> > logs
> > > > >> > > and
> > > > >> > > >>>> >> thereby
> > > > >> > > >>>> >> >> >> >> removing the last copies of the committed
> > records
> > > > >> which
> > > > >> > > the
> > > > >> > > >>>> leader
> > > > >> > > >>>> >> >> lost
> > > > >> > > >>>> >> >> >> >> initially.
> > > > >> > > >>>> >> >> >> >>
> > > > >> > > >>>> >> >> >> >> The new KIP will maximize the protection and
> > > > provides
> > > > >> > > >>>> MinISR-1
> > > > >> > > >>>> >> >> >> tolerance to
> > > > >> > > >>>> >> >> >> >> data loss unclean shutdown events.
> > > > >> > > >>>> >> >> >> >>
> > > > >> > > >>>> >> >> >> >>
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >>
> > > > >> > > >>>>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
> > > > >> > > >>>> >> >> >>
> > > > >> > > >>>> >> >>
> > > > >> > > >>>> >>
> > > > >> > > >>>>
> > > > >> > > >>>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
>

Reply via email to