Seems Clay's idea is also a very good idea. that would also make impl much
simpler and the focus would only be on cost functions.

Regards
Ram

On Fri, Jun 21, 2019 at 10:21 AM Anoop John <[email protected]> wrote:

> Same Q as Clay asked.  We can see..
>
> Also generically we can not consider like only one table in cluster.  At
> top level we give options like balance per table level or per cluster level
> only.  This also should be considered for the new balancer also IMO.  Ya if
> it can work with cost function change alone, it will be much smaller
> change.  On high level am +1 for such a simple way to handle the
> heterogeneous nodes cluster.
>
> Anoop
>
> On Fri, Jun 21, 2019 at 5:15 AM Clay Baenziger (BLOOMBERG/ 731 LEX) <
> [email protected]> wrote:
>
> > Could it work to have the stochastic load balancer use pluggable cost
> > functions[1]? Then, could this type of a load balancer be implemented
> > simply as a new cost function which folks could choose to load and mix
> with
> > the others?
> >
> > -Clay
> >
> > [1]: Instead of this static list of cost functions?
> >
> https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L198
> >
> > From: [email protected] At: 06/20/19 12:54:23To:
> [email protected]
> > Subject: Re: Adding a new balancer to HBase
> >
> > Bonjour Pierre,
> >
> > Some time ago I build (for my own purposes) something similar that I
> called
> > "LoadBasedLoadBalancer" that moves the regions based on my servers load
> and
> > capacity. The load balancer is querying the region servers to get the
> > number of cores, the allocated heap, the 5 minutes average load, etc. and
> > balanced the regions based on that.
> >
> > I felt that need already years ago. What you are proposing is a
> simplified
> > version that will most probably be more stable and easier to implement. I
> > will be happy to assist you in the process or getting that into HBase.
> >
> > Have you already opened the JIRA to support that?
> >
> > Thanks,
> >
> > JMS
> >
> > Le jeu. 20 juin 2019 à 01:11, ramkrishna vasudevan <
> > [email protected]> a écrit :
> >
> > > Seems a very good idea for cloud servers. Pls feel free to raise a JIRA
> > and
> > > contribute your patch.
> > >
> > > Regards
> > > Ram
> > >
> > > On Tue, Jun 18, 2019 at 8:09 AM 刘新星 <[email protected]> wrote:
> > >
> > > >
> > > >
> > > > I'm interested on this. It sounds like a weighted load balancer and
> > > > valuable for those users deploy their hbase cluster on cloud servers.
> > > > You can create a jira and make a patch for better discussion.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2019-06-18 05:00:54, "Pierre Zemb" <[email protected]>
> > > wrote:
> > > > >Hi!
> > > > >
> > > > >My name is Pierre, I'm working at OVH, an European cloud-provider.
> Our
> > > > >team, Observability, is heavily relying on HBase to store telemetry.
> > We
> > > > >would like to open the discussion about adding into 1.4X and 2.X a
> new
> > > > >Balancer.
> > > > ><
> > > >
> > >
> >
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-situation
> > > > >Our
> > > > >situation
> > > > >
> > > > >The Observability team in OVH is responsible to handle logs and
> > metrics
> > > > >from all servers/applications/equipments within OVH. HBase is used
> as
> > > the
> > > > >datastore for metrics. We are using an open-source software called
> > > Warp10
> > > > ><https://warp10.io> to handle all the metrics coming from OVH's
> > > > >infrastructure. We are operating three HBase 1.4 clusters, including
> > one
> > > > >with 218 RegionServers which is growing every month.
> > > > >
> > > > >We found out that *in our usecase*(single table, dedicated HBase and
> > > > Hadoop
> > > > >tuned for our usecase, good key distribution)*, the number of
> regions
> > > per
> > > > >RS was the real limit for us*.
> > > > >
> > > > >Over the years, due to historical reasons and also the need to
> > benchmark
> > > > >new machines, we ended-up with differents groups of hardware: some
> > > servers
> > > > >can handle only 180 regions, whereas the biggest can handle more
> than
> > > 900.
> > > > >Because of such a difference, we had to disable the LoadBalancing to
> > > avoid
> > > > >the roundRobinAssigmnent. We developed some internal tooling which
> are
> > > > >responsible for load balancing regions across RegionServers. That
> was
> > > 1.5
> > > > >year ago.
> > > > >
> > > > >Today, we are thinking about fully integrate it within HBase, using
> > the
> > > > >LoadBalancer interface. We started working on a new Balancer called
> > > > >HeterogeneousBalancer, that will be able to fullfill our need.
> > > > ><
> > > >
> > >
> >
> >
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-does-it-wor
> > ks
> > > > >How
> > > > >does it works?
> > > > >
> > > > >A rule file is loaded before balancing. It contains lines of rules.
> A
> > > rule
> > > > >is composed of a regexp for hostname, and a limit. For example, we
> > could
> > > > >have:
> > > > >
> > > > >rs[0-9] 200
> > > > >rs1[0-9] 50
> > > > >
> > > > >RegionServers with hostname matching the first rules will have a
> limit
> > > of
> > > > >200, and the others 50. If there's no match, a default is set.
> > > > >
> > > > >Thanks to the rule, we have two informations: the max number of
> > regions
> > > > for
> > > > >this cluster, and the rules for each servers. HeterogeneousBalancer
> > will
> > > > >try to balance regions according to their capacity.
> > > > >
> > > > >Let's take an example. Let's say that we have 20 RS:
> > > > >
> > > > >   - 10 RS, named through rs0 to rs9 loaded with 60 regions each,
> and
> > > each
> > > > >   can handle 200 regions.
> > > > >   - 10 RS, named through rs10 to rs19 loaded with 60 regions each,
> > and
> > > > >   each can support 50 regions.
> > > > >
> > > > >Based on the following rules:
> > > > >
> > > > >rs[0-9] 200
> > > > >rs1[0-9] 50
> > > > >
> > > > >The second group is overloaded, whereas the first group has plenty
> of
> > > > space.
> > > > >
> > > > >We know that we can handle at maximum *2500 regions* (200*10 +
> 50*10)
> > > and
> > > > >we have currently *1200 regions* (60*20). HeterogeneousBalancer will
> > > > >understand that the cluster is *full at 48.0%* (1200/2500). Based on
> > > this
> > > > >information, we will then *try to put all the RegionServers to ~48%
> of
> > > > load
> > > > >according to the rules.* In this case, it will move regions from the
> > > > second
> > > > >group to the first.
> > > > >
> > > > >The balancer will:
> > > > >
> > > > >   - compute how many regions needs to be moved. In our example, by
> > > moving
> > > > >   36 regions on rs10, we could go from 120.0% to 46.0%
> > > > >   - select regions with lowest data-locality
> > > > >   - try to find an appropriate RS for the region. We will take the
> > > lowest
> > > > >   available RS.
> > > > >
> > > > ><
> > > >
> > >
> >
> >
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#current-status
> > > > >Current
> > > > >status
> > > > >
> > > > >We started the implementation, but it is not finished yet. we are
> > > planning
> > > > >to deploy it on a cluster with lower impact for testing, and then
> put
> > it
> > > > on
> > > > >our biggest cluster.
> > > > >
> > > > >We have some basic implementation of all methods, but we need to add
> > > more
> > > > >tests and make the code more robust. You can find the
> proof-of-concept
> > > > here
> > > > ><
> > > >
> > >
> >
> >
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main
> > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java
> > <
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java
> >
> > > > >,
> > > > >and some early tests here
> > > > ><
> > > >
> > >
> >
> >
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main
> > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java
> > <
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java
> >
> > > > >,
> > > > >here
> > > > ><
> > > >
> > >
> >
> >
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test
> >
> >
> /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.j
> > <
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.j
> >
> > ava
> > > > >,
> > > > >and here
> > > > ><
> > > >
> > >
> >
> >
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test
> >
> >
> /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.jav
> > <
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.jav
> >
> > a
> > > > >.
> > > > >We wrote the balancer for our use-case, which means that:
> > > > >
> > > > >   - there is one table
> > > > >   - there is no region-replica
> > > > >   - good key dispersion
> > > > >   - there is no regions on master
> > > > >
> > > > >However, we believe that this will not be too complicated to
> > implement.
> > > We
> > > > >are also thinking about the possibility to limit overassigments of
> > > regions
> > > > >by moving them to the least loaded RS.
> > > > >
> > > > >Even if the balancing strategy seems simple, we do think that having
> > the
> > > > >possibility to run HBase cluster on heterogeneous hardware is vital,
> > > > >especially in cloud environment, because you may not be able to buy
> > the
> > > > >same server specs throughout the years.
> > > > >
> > > > >What do you think about our approach? Are you interested for such a
> > > > >contribution?
> > > > >---
> > > > >
> > > > >Pierre ZEMB - OVH Group
> > > > >Observability/Metrics - Infrastructure Engineer
> > > > >pierrezemb.fr
> > > > >+33 7 86 95 61 65
> > > >
> > >
> >
> >
> >
>

Reply via email to