Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

George Li Fri, 19 Jul 2019 17:33:06 -0700

 Hi Stanislav,

Thanks for taking time to do the review and feedbacks.

The Preferred Leader "Blacklist" feature is meant to be temporary in most use 
cases listed (I will explain a case which might need to be "permanent" below). 
It's a quick/easy way for the on-call engineer to take away leaderships of a 
problem broker and mitigate kafka cluster production issues.  

The reassignment/rebalance is expensive especially involving moving a replica 
to a different broker. Even same replicas but changing the preferred leader 
ordering, it will require running reassignments (batching, staggering running 
in Production), and when the issue is resolved (e.g. empty broker caught-up 
with retention time,  the broker having hardware issues with poor performance 
is replaced, controller switched, etc.),  need to run reassignments again 
(either rollback previous reassignments or run rebalance to generate a new 
plan).  As you see, this reassignment approach is more tedious.  If there is a 
Preferred Leader blacklist of a broker, it can be simply added and removed to 
take effect. 

Below are some answers to your questions.

> - more fine-grained control over the blacklist. we may not want to
> blacklist all the preferred leaders, as that would make the blacklisted
> broker a follower of last resort which is not very useful. In the cases of
> an underpowered AWS machine or a controller, you might overshoot and make
> the broker very underutilized if you completely make it leaderless.

The current proposed changes in KIP-491 is to have the Preferred Leader 
Blacklist at the broker level, as it seems that it can satisfy most use-cases 
listed.  A fine-grained control feature can be added if there is a need to have 
preferred leader blacklist at the Topic Level (e.g. have a new topic config at 
the topic level).  

> - is not permanent. If we are to have a blacklist leaders config,
> rebalancing tools would also need to know about it and manipulate/respect
> it to achieve a fair balance.
> It seems like both problems are tied to balancing partitions, it's just
> that KIP-491's use case wants to balance them against other factors in a
> more nuanced way. It makes sense to have both be done from the same place

Most of the use case, the preferred leader blacklist is temporary.  One case I 
could think of that will be somewhat permanent is the Cross Data Center less 
powerful AWS instances case. For some critical data which needs protection 
against data loss because of the whole DC failure.  We have 1 on-premise data 
center, and 2 AWS data centers.  The topic/partition replicas are spread to 
these 3 DCs.  

The Preferred Leader Blacklist will be somewhat permanent in this case.  Even 
we run reassignments to move all preferred leaders to the On-Premises brokers 
for existing topics, there is always new topics created and existing topics 
partitions getting expanded for capacity growth.  The new partitions' preferred 
leaders are not guaranteed to be the on-premises brokers.  The topic management 
(new/expand) code needs some info about blacklist leaders, which is missing 
now.   With the Preferred Leader Blacklist in-place, we can make sure the AWS 
DC instances broker will not be serving traffic normally, unless the on-prem 
brokers is down. It's a better safe guard for better performance. 

> To make note of the motivation section:
> > Avoid bouncing broker in order to lose its leadership
> The recommended way to make a broker lose its leadership is to run a
> reassignment on its partitions

Understood.  This new preferred leader blacklist feature is trying to improve 
and make it easier/cleaner/quicker to do it. 

> > The cross-data center cluster has AWS cloud instances which have less
> computing power
> We recommend running Kafka on homogeneous machines. It would be cool if the
> system supported more flexibility in that regard but that is more nuanced
> and a preferred leader blacklist may not be the best first approach to the
> issue
We are aware of recommendation of not having heterogeneous hardware in the 
kafka cluster, but it this case, it's more cost-efficient to use AWS than 
spawning a new on-premise DC nearby with low latency.  

> Adding a new config which can fundamentally change the way replication is
> done is complex, both for the system (the replication code is complex
> enough) and the user. Users would have another potential config that could
> backfire on them - e.g if left forgotten.

Actually, this new proposed new dynamic config (e.g. 
preferred_leader_blacklist) should not affect replication code at all. It will 
just provide more information when leadership is determined (moving the brokers 
in the blacklist to the lowest priority) during preferred leader election or a 
failed broker with its leaders going to other live brokers. 

Just like any other configs,  the users need to understand what the config 
exactly is and need to add/remove config accordingly to the issues/situations 
they face. 

For the "permanent" use cases, this config can stay unless broker ids in the 
blacklist changes.

For the "temporary" use cases, after Zookeeper version 3.5.3, there is a new 
feature to create a zk node with TTL  e.g. in the use case of a failed broker 
getting replaced with an empty broker that starts replication using latest 
offsets, this broker can be put in the Preferred Leader Blacklist with a TTL of 
the broker retention time , e.g. 6 hours, after 6 hours this broker has full 
data, the blacklist config in zk will be removed automatically, and this broker 
can serve traffic its preferred leader partitions.  Before upgrading to ZK 
3.5.5 (KAFKA-8634) , the user will need to manually add/remove this dynamic 
config.

> Could you think of any downsides to implementing this functionality (or a
> variation of it) in the kafka-reassign-partitions.sh tool?
> One downside I can see is that we would not have it handle new partitions
> created after the "blacklist operation". As a first iteration I think that
> may be acceptable

kafka-reassign-partitions.sh is just for submitting reassignments plans.  I 
think many customers have their own in-house rebalancing algorithms to generate 
the reassignment plans (based on brokers' CPU/disk usage/leadership bytes 
serving/BytesIn, etc.). This preferred leader blacklist could be an additional 
info feeding the rebalancing algorithms.  But for KIP-491,  the main goal is to 
avoid any reassignments changes, because in most cases, it's temporary.  It 
will make it easier and quicker for the user to move a broker leadership 
election to the lowest priority. 

As you pointed it out above the "downside" for new partitions created, the 
preferred leader blacklist should solve it. 

Hope I have explained and conveyed this "Preferred Leader Blacklist" feature 
clearly. Please let me know if there are more questions. 

Thanks, 
George

    On Friday, July 19, 2019, 07:33:02 AM PDT, Stanislav Kozlovski 
<stanis...@confluent.io> wrote:  

 Hey George,

Thanks for the KIP, it's an interesting idea.

I was wondering whether we could achieve the same thing via the
kafka-reassign-partitions tool. As you had also said in the JIRA,  it is
true that this is currently very tedious with the tool. My thoughts are
that we could improve the tool and give it the notion of a "blacklisted
preferred leader".
This would have some benefits like:
- more fine-grained control over the blacklist. we may not want to
blacklist all the preferred leaders, as that would make the blacklisted
broker a follower of last resort which is not very useful. In the cases of
an underpowered AWS machine or a controller, you might overshoot and make
the broker very underutilized if you completely make it leaderless.
- is not permanent. If we are to have a blacklist leaders config,
rebalancing tools would also need to know about it and manipulate/respect
it to achieve a fair balance.
It seems like both problems are tied to balancing partitions, it's just
that KIP-491's use case wants to balance them against other factors in a
more nuanced way. It makes sense to have both be done from the same place

To make note of the motivation section:
> Avoid bouncing broker in order to lose its leadership
The recommended way to make a broker lose its leadership is to run a
reassignment on its partitions
> The cross-data center cluster has AWS cloud instances which have less
computing power
We recommend running Kafka on homogeneous machines. It would be cool if the
system supported more flexibility in that regard but that is more nuanced
and a preferred leader blacklist may not be the best first approach to the
issue

Adding a new config which can fundamentally change the way replication is
done is complex, both for the system (the replication code is complex
enough) and the user. Users would have another potential config that could
backfire on them - e.g if left forgotten.

Could you think of any downsides to implementing this functionality (or a
variation of it) in the kafka-reassign-partitions.sh tool?
One downside I can see is that we would not have it handle new partitions
created after the "blacklist operation". As a first iteration I think that
may be acceptable

Thanks,
Stanislav

On Fri, Jul 19, 2019 at 3:20 AM George Li <sql_consult...@yahoo.com.invalid>
wrote:

>  Hi,
>
> Pinging the list for the feedbacks of this KIP-491  (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
> )
>
>
> Thanks,
> George
>
>    On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li <
> sql_consult...@yahoo.com.INVALID> wrote:
>
>  Hi,
>
> I have created KIP-491 (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982)
> for putting a broker to the preferred leader blacklist or deprioritized
> list so when determining leadership,  it's moved to the lowest priority for
> some of the listed use-cases.
>
> Please provide your comments/feedbacks.
>
> Thanks,
> George
>
>
>
>  ----- Forwarded Message ----- From: Jose Armando Garcia Sancio (JIRA) <
> j...@apache.org>To: "sql_consult...@yahoo.com" <sql_consult...@yahoo.com>Sent:
> Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented]
> (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)
>
>    [
> https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511
> ]
>
> Jose Armando Garcia Sancio commented on KAFKA-8638:
> ---------------------------------------------------
>
> Thanks for feedback and clear use cases [~sql_consulting].
>
> > Preferred Leader Blacklist (deprioritized list)
> > -----------------------------------------------
> >
> >                Key: KAFKA-8638
> >                URL: https://issues.apache.org/jira/browse/KAFKA-8638
> >            Project: Kafka
> >          Issue Type: Improvement
> >          Components: config, controller, core
> >    Affects Versions: 1.1.1, 2.3.0, 2.2.1
> >            Reporter: GEORGE LI
> >            Assignee: GEORGE LI
> >            Priority: Major
> >
> > Currently, the kafka preferred leader election will pick the broker_id
> in the topic/partition replica assignments in a priority order when the
> broker is in ISR. The preferred leader is the broker id in the first
> position of replica. There are use-cases that, even the first broker in the
> replica assignment is in ISR, there is a need for it to be moved to the end
> of ordering (lowest priority) when deciding leadership during  preferred
> leader election.
> > Let’s use topic/partition replica (1,2,3) as an example. 1 is the
> preferred leader.  When preferred leadership is run, it will pick 1 as the
> leader if it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not
> in ISR, then pick 3 as the leader. There are use cases that, even 1 is in
> ISR, we would like it to be moved to the end of ordering (lowest priority)
> when deciding leadership during preferred leader election.  Below is a list
> of use cases:
> > * (If broker_id 1 is a swapped failed host and brought up with last
> segments or latest offset without historical data (There is another effort
> on this), it's better for it to not serve leadership till it's caught-up.
> > * The cross-data center cluster has AWS instances which have less
> computing power than the on-prem bare metal machines.  We could put the AWS
> broker_ids in Preferred Leader Blacklist, so on-prem brokers can be elected
> leaders, without changing the reassignments ordering of the replicas.
> > * If the broker_id 1 is constantly losing leadership after some time:
> "Flapping". we would want to exclude 1 to be a leader unless all other
> brokers of this topic/partition are offline.  The “Flapping” effect was
> seen in the past when 2 or more brokers were bad, when they lost leadership
> constantly/quickly, the sets of partition replicas they belong to will see
> leadership constantly changing.  The ultimate solution is to swap these bad
> hosts.  But for quick mitigation, we can also put the bad hosts in the
> Preferred Leader Blacklist to move the priority of its being elected as
> leaders to the lowest.
> > *  If the controller is busy serving an extra load of metadata requests
> and other tasks. we would like to put the controller's leaders to other
> brokers to lower its CPU load. currently bouncing to lose leadership would
> not work for Controller, because after the bounce, the controller fails
> over to another broker.
> > * Avoid bouncing broker in order to lose its leadership: it would be
> good if we have a way to specify which broker should be excluded from
> serving traffic/leadership (without changing the replica assignment
> ordering by reassignments, even though that's quick), and run preferred
> leader election.  A bouncing broker will cause temporary URP, and sometimes
> other issues.  Also a bouncing of broker (e.g. broker_id 1) can temporarily
> lose all its leadership, but if another broker (e.g. broker_id 2) fails or
> gets bounced, some of its leaderships will likely failover to broker_id 1
> on a replica with 3 brokers.  If broker_id 1 is in the blacklist, then in
> such a scenario even broker_id 2 offline,  the 3rd broker can take
> leadership.
> > The current work-around of the above is to change the topic/partition's
> replica reassignments to move the broker_id 1 from the first position to
> the last position and run preferred leader election. e.g. (1, 2, 3) => (2,
> 3, 1). This changes the replica reassignments, and we need to keep track of
> the original one and restore if things change (e.g. controller fails over
> to another broker, the swapped empty broker caught up). That’s a rather
> tedious task.
> >
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

Reply via email to