I'm interested on this. It sounds like a weighted load balancer and valuable 
for those users deploy their hbase cluster on cloud servers. 
You can create a jira and make a patch for better discussion.







At 2019-06-18 05:00:54, "Pierre Zemb" <[email protected]> wrote:
>Hi!
>
>My name is Pierre, I'm working at OVH, an European cloud-provider. Our
>team, Observability, is heavily relying on HBase to store telemetry. We
>would like to open the discussion about adding into 1.4X and 2.X a new
>Balancer.
><https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-situation>Our
>situation
>
>The Observability team in OVH is responsible to handle logs and metrics
>from all servers/applications/equipments within OVH. HBase is used as the
>datastore for metrics. We are using an open-source software called Warp10
><https://warp10.io> to handle all the metrics coming from OVH's
>infrastructure. We are operating three HBase 1.4 clusters, including one
>with 218 RegionServers which is growing every month.
>
>We found out that *in our usecase*(single table, dedicated HBase and Hadoop
>tuned for our usecase, good key distribution)*, the number of regions per
>RS was the real limit for us*.
>
>Over the years, due to historical reasons and also the need to benchmark
>new machines, we ended-up with differents groups of hardware: some servers
>can handle only 180 regions, whereas the biggest can handle more than 900.
>Because of such a difference, we had to disable the LoadBalancing to avoid
>the roundRobinAssigmnent. We developed some internal tooling which are
>responsible for load balancing regions across RegionServers. That was 1.5
>year ago.
>
>Today, we are thinking about fully integrate it within HBase, using the
>LoadBalancer interface. We started working on a new Balancer called
>HeterogeneousBalancer, that will be able to fullfill our need.
><https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-does-it-works>How
>does it works?
>
>A rule file is loaded before balancing. It contains lines of rules. A rule
>is composed of a regexp for hostname, and a limit. For example, we could
>have:
>
>rs[0-9] 200
>rs1[0-9] 50
>
>RegionServers with hostname matching the first rules will have a limit of
>200, and the others 50. If there's no match, a default is set.
>
>Thanks to the rule, we have two informations: the max number of regions for
>this cluster, and the rules for each servers. HeterogeneousBalancer will
>try to balance regions according to their capacity.
>
>Let's take an example. Let's say that we have 20 RS:
>
>   - 10 RS, named through rs0 to rs9 loaded with 60 regions each, and each
>   can handle 200 regions.
>   - 10 RS, named through rs10 to rs19 loaded with 60 regions each, and
>   each can support 50 regions.
>
>Based on the following rules:
>
>rs[0-9] 200
>rs1[0-9] 50
>
>The second group is overloaded, whereas the first group has plenty of space.
>
>We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and
>we have currently *1200 regions* (60*20). HeterogeneousBalancer will
>understand that the cluster is *full at 48.0%* (1200/2500). Based on this
>information, we will then *try to put all the RegionServers to ~48% of load
>according to the rules.* In this case, it will move regions from the second
>group to the first.
>
>The balancer will:
>
>   - compute how many regions needs to be moved. In our example, by moving
>   36 regions on rs10, we could go from 120.0% to 46.0%
>   - select regions with lowest data-locality
>   - try to find an appropriate RS for the region. We will take the lowest
>   available RS.
>
><https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#current-status>Current
>status
>
>We started the implementation, but it is not finished yet. we are planning
>to deploy it on a cluster with lower impact for testing, and then put it on
>our biggest cluster.
>
>We have some basic implementation of all methods, but we need to add more
>tests and make the code more robust. You can find the proof-of-concept here
><https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java>,
>and some early tests here
><https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java>,
>here
><https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java>,
>and here
><https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java>.
>We wrote the balancer for our use-case, which means that:
>
>   - there is one table
>   - there is no region-replica
>   - good key dispersion
>   - there is no regions on master
>
>However, we believe that this will not be too complicated to implement. We
>are also thinking about the possibility to limit overassigments of regions
>by moving them to the least loaded RS.
>
>Even if the balancing strategy seems simple, we do think that having the
>possibility to run HBase cluster on heterogeneous hardware is vital,
>especially in cloud environment, because you may not be able to buy the
>same server specs throughout the years.
>
>What do you think about our approach? Are you interested for such a
>contribution?
>---
>
>Pierre ZEMB - OVH Group
>Observability/Metrics - Infrastructure Engineer
>pierrezemb.fr
>+33 7 86 95 61 65

Reply via email to