Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

Satish Duggana Wed, 07 Aug 2019 02:08:49 -0700

Hi George,
Thanks for addressing the comments. I do not have any more questions.


On Wed, Aug 7, 2019 at 11:08 AM George Li
<sql_consult...@yahoo.com.invalid> wrote:
>
>  Hi Colin, Satish, Stanislav,
>
> Did I answer all your comments/concerns for KIP-491 ?  Please let me know if 
> you have more questions regarding this feature.  I would like to start coding 
> soon. I hope this feature can get into the open source trunk so every time we 
> upgrade Kafka in our environment, we don't need to cherry pick this.
>
> BTW, I have added below in KIP-491 for auto.leader.rebalance.enable behavior 
> with the new Preferred Leader "Blacklist".
>
> "When auto.leader.rebalance.enable is enabled.  The broker(s) in the 
> preferred leader "blacklist" should be excluded from being elected leaders. "
>
>
> Thanks,
> George
>
>     On Friday, August 2, 2019, 08:02:07 PM PDT, George Li 
> <sql_consult...@yahoo.com.INVALID> wrote:
>
>   Hi Colin,
> Thanks for looking into this KIP.  Sorry for the late response. been busy.
>
> If a cluster has MAMY topic partitions, moving this "blacklist" broker to the 
> end of replica list is still a rather "big" operation, involving submitting 
> reassignments.  The KIP-491 way of blacklist is much simpler/easier and can 
> undo easily without changing the replica assignment ordering.
> Major use case for me, a failed broker got swapped with new hardware, and 
> starts up as empty (with latest offset of all partitions), the SLA of 
> retention is 1 day, so before this broker is up to be in-sync for 1 day, we 
> would like to blacklist this broker from serving traffic. after 1 day, the 
> blacklist is removed and run preferred leader election.  This way, no need to 
> run reassignments before/after.  This is the "temporary" use-case.
>
> There are use-cases that this Preferred Leader "blacklist" can be somewhat 
> permanent, as I explained in the AWS data center instances Vs. on-premises 
> data center bare metal machines (heterogenous hardware), that the AWS 
> broker_ids will be blacklisted.  So new topics created,  or existing topic 
> expansion would not make them serve traffic even they could be the preferred 
> leader.
>
> Please let me know there are more question.
>
>
> Thanks,
> George
>
>     On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe 
> <cmcc...@apache.org> wrote:
>
>  We still want to give the "blacklisted" broker the leadership if nobody else 
> is available.  Therefore, isn't putting a broker on the blacklist pretty much 
> the same as moving it to the last entry in the replicas list and then 
> triggering a preferred leader election?
>
> If we want this to be undone after a certain amount of time, or under certain 
> conditions, that seems like something that would be more effectively done by 
> an external system, rather than putting all these policies into Kafka.
>
> best,
> Colin
>
>
> On Fri, Jul 19, 2019, at 18:23, George Li wrote:
> >  Hi Satish,
> > Thanks for the reviews and feedbacks.
> >
> > > > The following is the requirements this KIP is trying to accomplish:
> > > This can be moved to the"Proposed changes" section.
> >
> > Updated the KIP-491.
> >
> > > >>The logic to determine the priority/order of which broker should be
> > > preferred leader should be modified.  The broker in the preferred leader
> > > blacklist should be moved to the end (lowest priority) when
> > > determining leadership.
> > >
> > > I believe there is no change required in the ordering of the preferred
> > > replica list. Brokers in the preferred leader blacklist are skipped
> > > until other brokers int he list are unavailable.
> >
> > Yes. partition assignment remained the same, replica & ordering. The
> > blacklist logic can be optimized during implementation.
> >
> > > >>The blacklist can be at the broker level. However, there might be use 
> > > >>cases
> > > where a specific topic should blacklist particular brokers, which
> > > would be at the
> > > Topic level Config. For this use cases of this KIP, it seems that broker 
> > > level
> > > blacklist would suffice.  Topic level preferred leader blacklist might
> > > be future enhancement work.
> > >
> > > I agree that the broker level preferred leader blacklist would be
> > > sufficient. Do you have any use cases which require topic level
> > > preferred blacklist?
> >
> >
> >
> > I don't have any concrete use cases for Topic level preferred leader
> > blacklist.  One scenarios I can think of is when a broker has high CPU
> > usage, trying to identify the big topics (High MsgIn, High BytesIn,
> > etc), then try to move the leaders away from this broker,  before doing
> > an actual reassignment to change its preferred leader,  try to put this
> > preferred_leader_blacklist in the Topic Level config, and run preferred
> > leader election, and see whether CPU decreases for this broker,  if
> > yes, then do the reassignments to change the preferred leaders to be
> > "permanent" (the topic may have many partitions like 256 that has quite
> > a few of them having this broker as preferred leader).  So this Topic
> > Level config is an easy way of doing trial and check the result.
> >
> >
> > > You can add the below workaround as an item in the rejected alternatives 
> > > section
> > > "Reassigning all the topic/partitions which the intended broker is a
> > > replica for."
> >
> > Updated the KIP-491.
> >
> >
> >
> > Thanks,
> > George
> >
> >    On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana
> > <satish.dugg...@gmail.com> wrote:
> >
> >  Thanks for the KIP. I have put my comments below.
> >
> > This is a nice improvement to avoid cumbersome maintenance.
> >
> > >> The following is the requirements this KIP is trying to accomplish:
> >   The ability to add and remove the preferred leader deprioritized
> > list/blacklist. e.g. new ZK path/node or new dynamic config.
> >
> > This can be moved to the"Proposed changes" section.
> >
> > >>The logic to determine the priority/order of which broker should be
> > preferred leader should be modified.  The broker in the preferred leader
> > blacklist should be moved to the end (lowest priority) when
> > determining leadership.
> >
> > I believe there is no change required in the ordering of the preferred
> > replica list. Brokers in the preferred leader blacklist are skipped
> > until other brokers int he list are unavailable.
> >
> > >>The blacklist can be at the broker level. However, there might be use 
> > >>cases
> > where a specific topic should blacklist particular brokers, which
> > would be at the
> > Topic level Config. For this use cases of this KIP, it seems that broker 
> > level
> > blacklist would suffice.  Topic level preferred leader blacklist might
> > be future enhancement work.
> >
> > I agree that the broker level preferred leader blacklist would be
> > sufficient. Do you have any use cases which require topic level
> > preferred blacklist?
> >
> > You can add the below workaround as an item in the rejected alternatives 
> > section
> > "Reassigning all the topic/partitions which the intended broker is a
> > replica for."
> >
> > Thanks,
> > Satish.
> >
> > On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski
> > <stanis...@confluent.io> wrote:
> > >
> > > Hey George,
> > >
> > > Thanks for the KIP, it's an interesting idea.
> > >
> > > I was wondering whether we could achieve the same thing via the
> > > kafka-reassign-partitions tool. As you had also said in the JIRA,  it is
> > > true that this is currently very tedious with the tool. My thoughts are
> > > that we could improve the tool and give it the notion of a "blacklisted
> > > preferred leader".
> > > This would have some benefits like:
> > > - more fine-grained control over the blacklist. we may not want to
> > > blacklist all the preferred leaders, as that would make the blacklisted
> > > broker a follower of last resort which is not very useful. In the cases of
> > > an underpowered AWS machine or a controller, you might overshoot and make
> > > the broker very underutilized if you completely make it leaderless.
> > > - is not permanent. If we are to have a blacklist leaders config,
> > > rebalancing tools would also need to know about it and manipulate/respect
> > > it to achieve a fair balance.
> > > It seems like both problems are tied to balancing partitions, it's just
> > > that KIP-491's use case wants to balance them against other factors in a
> > > more nuanced way. It makes sense to have both be done from the same place
> > >
> > > To make note of the motivation section:
> > > > Avoid bouncing broker in order to lose its leadership
> > > The recommended way to make a broker lose its leadership is to run a
> > > reassignment on its partitions
> > > > The cross-data center cluster has AWS cloud instances which have less
> > > computing power
> > > We recommend running Kafka on homogeneous machines. It would be cool if 
> > > the
> > > system supported more flexibility in that regard but that is more nuanced
> > > and a preferred leader blacklist may not be the best first approach to the
> > > issue
> > >
> > > Adding a new config which can fundamentally change the way replication is
> > > done is complex, both for the system (the replication code is complex
> > > enough) and the user. Users would have another potential config that could
> > > backfire on them - e.g if left forgotten.
> > >
> > > Could you think of any downsides to implementing this functionality (or a
> > > variation of it) in the kafka-reassign-partitions.sh tool?
> > > One downside I can see is that we would not have it handle new partitions
> > > created after the "blacklist operation". As a first iteration I think that
> > > may be acceptable
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Fri, Jul 19, 2019 at 3:20 AM George Li 
> > > <sql_consult...@yahoo.com.invalid>
> > > wrote:
> > >
> > > >  Hi,
> > > >
> > > > Pinging the list for the feedbacks of this KIP-491  (
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
> > > > )
> > > >
> > > >
> > > > Thanks,
> > > > George
> > > >
> > > >    On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li <
> > > > sql_consult...@yahoo.com.INVALID> wrote:
> > > >
> > > >  Hi,
> > > >
> > > > I have created KIP-491 (
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982)
> > > > for putting a broker to the preferred leader blacklist or deprioritized
> > > > list so when determining leadership,  it's moved to the lowest priority 
> > > > for
> > > > some of the listed use-cases.
> > > >
> > > > Please provide your comments/feedbacks.
> > > >
> > > > Thanks,
> > > > George
> > > >
> > > >
> > > >
> > > >  ----- Forwarded Message ----- From: Jose Armando Garcia Sancio (JIRA) <
> > > > j...@apache.org>To: "sql_consult...@yahoo.com" 
> > > > <sql_consult...@yahoo.com>Sent:
> > > > Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented]
> > > > (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)
> > > >
> > > >    [
> > > > https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511
> > > > ]
> > > >
> > > > Jose Armando Garcia Sancio commented on KAFKA-8638:
> > > > ---------------------------------------------------
> > > >
> > > > Thanks for feedback and clear use cases [~sql_consulting].
> > > >
> > > > > Preferred Leader Blacklist (deprioritized list)
> > > > > -----------------------------------------------
> > > > >
> > > > >                Key: KAFKA-8638
> > > > >                URL: https://issues.apache.org/jira/browse/KAFKA-8638
> > > > >            Project: Kafka
> > > > >          Issue Type: Improvement
> > > > >          Components: config, controller, core
> > > > >    Affects Versions: 1.1.1, 2.3.0, 2.2.1
> > > > >            Reporter: GEORGE LI
> > > > >            Assignee: GEORGE LI
> > > > >            Priority: Major
> > > > >
> > > > > Currently, the kafka preferred leader election will pick the broker_id
> > > > in the topic/partition replica assignments in a priority order when the
> > > > broker is in ISR. The preferred leader is the broker id in the first
> > > > position of replica. There are use-cases that, even the first broker in 
> > > > the
> > > > replica assignment is in ISR, there is a need for it to be moved to the 
> > > > end
> > > > of ordering (lowest priority) when deciding leadership during  preferred
> > > > leader election.
> > > > > Let’s use topic/partition replica (1,2,3) as an example. 1 is the
> > > > preferred leader.  When preferred leadership is run, it will pick 1 as 
> > > > the
> > > > leader if it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is 
> > > > not
> > > > in ISR, then pick 3 as the leader. There are use cases that, even 1 is 
> > > > in
> > > > ISR, we would like it to be moved to the end of ordering (lowest 
> > > > priority)
> > > > when deciding leadership during preferred leader election.  Below is a 
> > > > list
> > > > of use cases:
> > > > > * (If broker_id 1 is a swapped failed host and brought up with last
> > > > segments or latest offset without historical data (There is another 
> > > > effort
> > > > on this), it's better for it to not serve leadership till it's 
> > > > caught-up.
> > > > > * The cross-data center cluster has AWS instances which have less
> > > > computing power than the on-prem bare metal machines.  We could put the 
> > > > AWS
> > > > broker_ids in Preferred Leader Blacklist, so on-prem brokers can be 
> > > > elected
> > > > leaders, without changing the reassignments ordering of the replicas.
> > > > > * If the broker_id 1 is constantly losing leadership after some time:
> > > > "Flapping". we would want to exclude 1 to be a leader unless all other
> > > > brokers of this topic/partition are offline.  The “Flapping” effect was
> > > > seen in the past when 2 or more brokers were bad, when they lost 
> > > > leadership
> > > > constantly/quickly, the sets of partition replicas they belong to will 
> > > > see
> > > > leadership constantly changing.  The ultimate solution is to swap these 
> > > > bad
> > > > hosts.  But for quick mitigation, we can also put the bad hosts in the
> > > > Preferred Leader Blacklist to move the priority of its being elected as
> > > > leaders to the lowest.
> > > > > *  If the controller is busy serving an extra load of metadata 
> > > > > requests
> > > > and other tasks. we would like to put the controller's leaders to other
> > > > brokers to lower its CPU load. currently bouncing to lose leadership 
> > > > would
> > > > not work for Controller, because after the bounce, the controller fails
> > > > over to another broker.
> > > > > * Avoid bouncing broker in order to lose its leadership: it would be
> > > > good if we have a way to specify which broker should be excluded from
> > > > serving traffic/leadership (without changing the replica assignment
> > > > ordering by reassignments, even though that's quick), and run preferred
> > > > leader election.  A bouncing broker will cause temporary URP, and 
> > > > sometimes
> > > > other issues.  Also a bouncing of broker (e.g. broker_id 1) can 
> > > > temporarily
> > > > lose all its leadership, but if another broker (e.g. broker_id 2) fails 
> > > > or
> > > > gets bounced, some of its leaderships will likely failover to broker_id 
> > > > 1
> > > > on a replica with 3 brokers.  If broker_id 1 is in the blacklist, then 
> > > > in
> > > > such a scenario even broker_id 2 offline,  the 3rd broker can take
> > > > leadership.
> > > > > The current work-around of the above is to change the 
> > > > > topic/partition's
> > > > replica reassignments to move the broker_id 1 from the first position to
> > > > the last position and run preferred leader election. e.g. (1, 2, 3) => 
> > > > (2,
> > > > 3, 1). This changes the replica reassignments, and we need to keep 
> > > > track of
> > > > the original one and restore if things change (e.g. controller fails 
> > > > over
> > > > to another broker, the swapped empty broker caught up). That’s a rather
> > > > tedious task.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > This message was sent by Atlassian JIRA
> > > > (v7.6.3#76005)

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

Reply via email to