[
https://issues.apache.org/jira/browse/KUDU-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenxingwuying resolved KUDU-3390.
----------------------------------
Fix Version/s: 1.17.0
Resolution: Fixed
> add new feature auto leader rebalancer
> --------------------------------------
>
> Key: KUDU-3390
> URL: https://issues.apache.org/jira/browse/KUDU-3390
> Project: Kudu
> Issue Type: New Feature
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
> Fix For: 1.17.0
>
>
> The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I
> create a new Jira issus to record some infomations.
>
>
> h1. Motivation
> The number of leader replicas per tablet server can become imbalanced over
> time, which lead to load skew on some nodes.
> Two reasons of load skew:
> * The main reason. Scan Requests has two modes: LeaderOnly(default) and
> CLOSEST_REPLICA. For more accurate results, users will choose the
> LeaderOnly(default) mode. Mostly, the scan load is positive correlation with
> leader numbers.
> * The other reason. Write requests, leaders receive write requests and
> followers receive appendEntries(kudu is UpdateConsensus), the flow of
> processing is a little different, which is hidden variables, maybe cause
> imbalanced load. Leader rebalance will make leader and followers balanced and
> eliminate hidden variables and make service more stable.
> To deal with the situation, now users can use kudu CLI leader_step_down
> command and write a script program to rebalance the leaders. SREs should make
> the rebalance script run periodically.
>
> In our application situation, We have more than 1500+ kudu clusters and more
> and more kudu cluster will be deployed, so it's hard that SREs maintenance
> the rebalance script tasks.
> kudu has the auto rebalance and has no auto leader rebalance,
> We can do better. Leader kudu-master can do leader rebalance automatically.
> h1. Solution
> We can add an auto leader rebalance task to avoid leader replicas skew.
> Running a periodic task do leader rebalance at kudu-master.
> Leader rebalance only do leader transfer, do not copy replicas. The basic
> idea is every tserver leaders' number : replicas' number = 1 :
> (replica_refactor - 1). This is the argrithms.
> If we need leader rebalance, we'd better enable replicas rebalancer. If
> enable leader rebalancer but disable auto rebalancer the algorithm work well
> but the effect is not good. The algorithm can be convergence, and the
> algorithm's target is every tserver' replicas, number of leader : number of
> follower is 1 : (replica_refactor -1).
> h1. Leader Rebalance results
> I do some experiments for the effective. I have a cluster, 3 machines: 3
> master instances and 3 tserver instances.
> I create a table with 40 tablets(partitions) and 3 replica_factor. And load a
> lots of data (40000000 records).
> I disabled the leader rebalance function, and manually leader transfer all
> tablets to a tserver and run writes and scans.
> Then I enabled the the leader rebalance function and runs scans. The workload
> as below:
> The Scan command: {{./kudu_tools/kudu perf table_scan $master_list Student
> -columns=id,name,brief,age,score -num_threads=4 -nofill_cache
> -replica_selection="LEADER"}}
>
> 40: 0: 0 means node1 : node2: node3
> 47%, 18%, 19% means node1 : node2: node3
>
> || ||leader ratio||scan cost||cpu usage||memory||io||
> |before leader rebalance|40: 0: 0|811.586 s|47%, 18%, 19%|no changes|102MB/s
> ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%|
> |after leader rebalance|13: 14: 13|611.012 s|39%, 45%, 35%|no changes|53MB/s
> ioutil:31%, 80MB/s ioutil:18%, 45MB/s ioutil:24%|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)