Hi George, Thanks for addressing the comments. I do not have any more questions.
On Wed, Aug 7, 2019 at 11:08 AM George Li <sql_consult...@yahoo.com.invalid> wrote: > > Hi Colin, Satish, Stanislav, > > Did I answer all your comments/concerns for KIP-491 ? Please let me know if > you have more questions regarding this feature. I would like to start coding > soon. I hope this feature can get into the open source trunk so every time we > upgrade Kafka in our environment, we don't need to cherry pick this. > > BTW, I have added below in KIP-491 for auto.leader.rebalance.enable behavior > with the new Preferred Leader "Blacklist". > > "When auto.leader.rebalance.enable is enabled. The broker(s) in the > preferred leader "blacklist" should be excluded from being elected leaders. " > > > Thanks, > George > > On Friday, August 2, 2019, 08:02:07 PM PDT, George Li > <sql_consult...@yahoo.com.INVALID> wrote: > > Hi Colin, > Thanks for looking into this KIP. Sorry for the late response. been busy. > > If a cluster has MAMY topic partitions, moving this "blacklist" broker to the > end of replica list is still a rather "big" operation, involving submitting > reassignments. The KIP-491 way of blacklist is much simpler/easier and can > undo easily without changing the replica assignment ordering. > Major use case for me, a failed broker got swapped with new hardware, and > starts up as empty (with latest offset of all partitions), the SLA of > retention is 1 day, so before this broker is up to be in-sync for 1 day, we > would like to blacklist this broker from serving traffic. after 1 day, the > blacklist is removed and run preferred leader election. This way, no need to > run reassignments before/after. This is the "temporary" use-case. > > There are use-cases that this Preferred Leader "blacklist" can be somewhat > permanent, as I explained in the AWS data center instances Vs. on-premises > data center bare metal machines (heterogenous hardware), that the AWS > broker_ids will be blacklisted. So new topics created, or existing topic > expansion would not make them serve traffic even they could be the preferred > leader. > > Please let me know there are more question. > > > Thanks, > George > > On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe > <cmcc...@apache.org> wrote: > > We still want to give the "blacklisted" broker the leadership if nobody else > is available. Therefore, isn't putting a broker on the blacklist pretty much > the same as moving it to the last entry in the replicas list and then > triggering a preferred leader election? > > If we want this to be undone after a certain amount of time, or under certain > conditions, that seems like something that would be more effectively done by > an external system, rather than putting all these policies into Kafka. > > best, > Colin > > > On Fri, Jul 19, 2019, at 18:23, George Li wrote: > > Hi Satish, > > Thanks for the reviews and feedbacks. > > > > > > The following is the requirements this KIP is trying to accomplish: > > > This can be moved to the"Proposed changes" section. > > > > Updated the KIP-491. > > > > > >>The logic to determine the priority/order of which broker should be > > > preferred leader should be modified. The broker in the preferred leader > > > blacklist should be moved to the end (lowest priority) when > > > determining leadership. > > > > > > I believe there is no change required in the ordering of the preferred > > > replica list. Brokers in the preferred leader blacklist are skipped > > > until other brokers int he list are unavailable. > > > > Yes. partition assignment remained the same, replica & ordering. The > > blacklist logic can be optimized during implementation. > > > > > >>The blacklist can be at the broker level. However, there might be use > > > >>cases > > > where a specific topic should blacklist particular brokers, which > > > would be at the > > > Topic level Config. For this use cases of this KIP, it seems that broker > > > level > > > blacklist would suffice. Topic level preferred leader blacklist might > > > be future enhancement work. > > > > > > I agree that the broker level preferred leader blacklist would be > > > sufficient. Do you have any use cases which require topic level > > > preferred blacklist? > > > > > > > > I don't have any concrete use cases for Topic level preferred leader > > blacklist. One scenarios I can think of is when a broker has high CPU > > usage, trying to identify the big topics (High MsgIn, High BytesIn, > > etc), then try to move the leaders away from this broker, before doing > > an actual reassignment to change its preferred leader, try to put this > > preferred_leader_blacklist in the Topic Level config, and run preferred > > leader election, and see whether CPU decreases for this broker, if > > yes, then do the reassignments to change the preferred leaders to be > > "permanent" (the topic may have many partitions like 256 that has quite > > a few of them having this broker as preferred leader). So this Topic > > Level config is an easy way of doing trial and check the result. > > > > > > > You can add the below workaround as an item in the rejected alternatives > > > section > > > "Reassigning all the topic/partitions which the intended broker is a > > > replica for." > > > > Updated the KIP-491. > > > > > > > > Thanks, > > George > > > > On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana > > <satish.dugg...@gmail.com> wrote: > > > > Thanks for the KIP. I have put my comments below. > > > > This is a nice improvement to avoid cumbersome maintenance. > > > > >> The following is the requirements this KIP is trying to accomplish: > > The ability to add and remove the preferred leader deprioritized > > list/blacklist. e.g. new ZK path/node or new dynamic config. > > > > This can be moved to the"Proposed changes" section. > > > > >>The logic to determine the priority/order of which broker should be > > preferred leader should be modified. The broker in the preferred leader > > blacklist should be moved to the end (lowest priority) when > > determining leadership. > > > > I believe there is no change required in the ordering of the preferred > > replica list. Brokers in the preferred leader blacklist are skipped > > until other brokers int he list are unavailable. > > > > >>The blacklist can be at the broker level. However, there might be use > > >>cases > > where a specific topic should blacklist particular brokers, which > > would be at the > > Topic level Config. For this use cases of this KIP, it seems that broker > > level > > blacklist would suffice. Topic level preferred leader blacklist might > > be future enhancement work. > > > > I agree that the broker level preferred leader blacklist would be > > sufficient. Do you have any use cases which require topic level > > preferred blacklist? > > > > You can add the below workaround as an item in the rejected alternatives > > section > > "Reassigning all the topic/partitions which the intended broker is a > > replica for." > > > > Thanks, > > Satish. > > > > On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski > > <stanis...@confluent.io> wrote: > > > > > > Hey George, > > > > > > Thanks for the KIP, it's an interesting idea. > > > > > > I was wondering whether we could achieve the same thing via the > > > kafka-reassign-partitions tool. As you had also said in the JIRA, it is > > > true that this is currently very tedious with the tool. My thoughts are > > > that we could improve the tool and give it the notion of a "blacklisted > > > preferred leader". > > > This would have some benefits like: > > > - more fine-grained control over the blacklist. we may not want to > > > blacklist all the preferred leaders, as that would make the blacklisted > > > broker a follower of last resort which is not very useful. In the cases of > > > an underpowered AWS machine or a controller, you might overshoot and make > > > the broker very underutilized if you completely make it leaderless. > > > - is not permanent. If we are to have a blacklist leaders config, > > > rebalancing tools would also need to know about it and manipulate/respect > > > it to achieve a fair balance. > > > It seems like both problems are tied to balancing partitions, it's just > > > that KIP-491's use case wants to balance them against other factors in a > > > more nuanced way. It makes sense to have both be done from the same place > > > > > > To make note of the motivation section: > > > > Avoid bouncing broker in order to lose its leadership > > > The recommended way to make a broker lose its leadership is to run a > > > reassignment on its partitions > > > > The cross-data center cluster has AWS cloud instances which have less > > > computing power > > > We recommend running Kafka on homogeneous machines. It would be cool if > > > the > > > system supported more flexibility in that regard but that is more nuanced > > > and a preferred leader blacklist may not be the best first approach to the > > > issue > > > > > > Adding a new config which can fundamentally change the way replication is > > > done is complex, both for the system (the replication code is complex > > > enough) and the user. Users would have another potential config that could > > > backfire on them - e.g if left forgotten. > > > > > > Could you think of any downsides to implementing this functionality (or a > > > variation of it) in the kafka-reassign-partitions.sh tool? > > > One downside I can see is that we would not have it handle new partitions > > > created after the "blacklist operation". As a first iteration I think that > > > may be acceptable > > > > > > Thanks, > > > Stanislav > > > > > > On Fri, Jul 19, 2019 at 3:20 AM George Li > > > <sql_consult...@yahoo.com.invalid> > > > wrote: > > > > > > > Hi, > > > > > > > > Pinging the list for the feedbacks of this KIP-491 ( > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982 > > > > ) > > > > > > > > > > > > Thanks, > > > > George > > > > > > > > On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li < > > > > sql_consult...@yahoo.com.INVALID> wrote: > > > > > > > > Hi, > > > > > > > > I have created KIP-491 ( > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982) > > > > for putting a broker to the preferred leader blacklist or deprioritized > > > > list so when determining leadership, it's moved to the lowest priority > > > > for > > > > some of the listed use-cases. > > > > > > > > Please provide your comments/feedbacks. > > > > > > > > Thanks, > > > > George > > > > > > > > > > > > > > > > ----- Forwarded Message ----- From: Jose Armando Garcia Sancio (JIRA) < > > > > j...@apache.org>To: "sql_consult...@yahoo.com" > > > > <sql_consult...@yahoo.com>Sent: > > > > Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented] > > > > (KAFKA-8638) Preferred Leader Blacklist (deprioritized list) > > > > > > > > [ > > > > https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511 > > > > ] > > > > > > > > Jose Armando Garcia Sancio commented on KAFKA-8638: > > > > --------------------------------------------------- > > > > > > > > Thanks for feedback and clear use cases [~sql_consulting]. > > > > > > > > > Preferred Leader Blacklist (deprioritized list) > > > > > ----------------------------------------------- > > > > > > > > > > Key: KAFKA-8638 > > > > > URL: https://issues.apache.org/jira/browse/KAFKA-8638 > > > > > Project: Kafka > > > > > Issue Type: Improvement > > > > > Components: config, controller, core > > > > > Affects Versions: 1.1.1, 2.3.0, 2.2.1 > > > > > Reporter: GEORGE LI > > > > > Assignee: GEORGE LI > > > > > Priority: Major > > > > > > > > > > Currently, the kafka preferred leader election will pick the broker_id > > > > in the topic/partition replica assignments in a priority order when the > > > > broker is in ISR. The preferred leader is the broker id in the first > > > > position of replica. There are use-cases that, even the first broker in > > > > the > > > > replica assignment is in ISR, there is a need for it to be moved to the > > > > end > > > > of ordering (lowest priority) when deciding leadership during preferred > > > > leader election. > > > > > Let’s use topic/partition replica (1,2,3) as an example. 1 is the > > > > preferred leader. When preferred leadership is run, it will pick 1 as > > > > the > > > > leader if it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is > > > > not > > > > in ISR, then pick 3 as the leader. There are use cases that, even 1 is > > > > in > > > > ISR, we would like it to be moved to the end of ordering (lowest > > > > priority) > > > > when deciding leadership during preferred leader election. Below is a > > > > list > > > > of use cases: > > > > > * (If broker_id 1 is a swapped failed host and brought up with last > > > > segments or latest offset without historical data (There is another > > > > effort > > > > on this), it's better for it to not serve leadership till it's > > > > caught-up. > > > > > * The cross-data center cluster has AWS instances which have less > > > > computing power than the on-prem bare metal machines. We could put the > > > > AWS > > > > broker_ids in Preferred Leader Blacklist, so on-prem brokers can be > > > > elected > > > > leaders, without changing the reassignments ordering of the replicas. > > > > > * If the broker_id 1 is constantly losing leadership after some time: > > > > "Flapping". we would want to exclude 1 to be a leader unless all other > > > > brokers of this topic/partition are offline. The “Flapping” effect was > > > > seen in the past when 2 or more brokers were bad, when they lost > > > > leadership > > > > constantly/quickly, the sets of partition replicas they belong to will > > > > see > > > > leadership constantly changing. The ultimate solution is to swap these > > > > bad > > > > hosts. But for quick mitigation, we can also put the bad hosts in the > > > > Preferred Leader Blacklist to move the priority of its being elected as > > > > leaders to the lowest. > > > > > * If the controller is busy serving an extra load of metadata > > > > > requests > > > > and other tasks. we would like to put the controller's leaders to other > > > > brokers to lower its CPU load. currently bouncing to lose leadership > > > > would > > > > not work for Controller, because after the bounce, the controller fails > > > > over to another broker. > > > > > * Avoid bouncing broker in order to lose its leadership: it would be > > > > good if we have a way to specify which broker should be excluded from > > > > serving traffic/leadership (without changing the replica assignment > > > > ordering by reassignments, even though that's quick), and run preferred > > > > leader election. A bouncing broker will cause temporary URP, and > > > > sometimes > > > > other issues. Also a bouncing of broker (e.g. broker_id 1) can > > > > temporarily > > > > lose all its leadership, but if another broker (e.g. broker_id 2) fails > > > > or > > > > gets bounced, some of its leaderships will likely failover to broker_id > > > > 1 > > > > on a replica with 3 brokers. If broker_id 1 is in the blacklist, then > > > > in > > > > such a scenario even broker_id 2 offline, the 3rd broker can take > > > > leadership. > > > > > The current work-around of the above is to change the > > > > > topic/partition's > > > > replica reassignments to move the broker_id 1 from the first position to > > > > the last position and run preferred leader election. e.g. (1, 2, 3) => > > > > (2, > > > > 3, 1). This changes the replica reassignments, and we need to keep > > > > track of > > > > the original one and restore if things change (e.g. controller fails > > > > over > > > > to another broker, the swapped empty broker caught up). That’s a rather > > > > tedious task. > > > > > > > > > > > > > > > > > > > > > -- > > > > This message was sent by Atlassian JIRA > > > > (v7.6.3#76005)