Hi everyone!

Wow, thanks a lot for all the replies! I love Clay's idea as it will be
really cool to mix user-defined and mainline stochastic functions.

I finally created the JIRA:
https://issues.apache.org/jira/browse/HBASE-22618

Let's make Heterogeneous deployments a reality!
---

Pierre ZEMB - OVH Group
Observability/Metrics - Infrastructure Engineer
pierrezemb.fr
+33 7 86 95 61 65


Le ven. 21 juin 2019 à 11:29, bhupendra jain <[email protected]> a
écrit :

> Its good idea for Heterogeneous deployment.
>
> Internally, we are also having discussion to develop similar solution. In
> our approach, We were also thinking of adding "RS Label" Feature similar to
> Hadoop Node Label feature.
> Each RS can have a label to denote its capabilities / resources . When
> user create table, there can be extra attributes with its descriptor. The
> balancer can decide to host region of table based on RS label and these
> attributes further.
> With RS label feature, Balancer can be more intelligent.  Example tables
> with high read load needs more cache backed by SSDs , So such table regions
> should be hosted on RS having SSDs ...
>
> I think , Your idea is simple and can get in fast as a first step for
> Heterogeneous cluster.
>
>
> Regards
> Bhupendra Kumar Jain
> Lead Architect , Enterprise Intelligence, IT&Cloud BU
>
> Huawei Technologies India Pvt. Ltd.
> Survey No. 37, Next to EPIP Area, Kundalahalli, Whitefield
> Bengaluru-560066, Karnataka
> Tel: + 91-80-49160700 Ext 71024 II Mob: 9886164367 Email:
> [email protected]
>
>
>
>
>
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
>
> -----Original Message-----
> From: ramkrishna vasudevan [mailto:[email protected]]
> Sent: Friday, June 21, 2019 10:31 AM
> To: dev <[email protected]>
> Cc: Clay Baenziger <[email protected]>
> Subject: Re: Adding a new balancer to HBase
>
> Seems Clay's idea is also a very good idea. that would also make impl much
> simpler and the focus would only be on cost functions.
>
> Regards
> Ram
>
> On Fri, Jun 21, 2019 at 10:21 AM Anoop John <[email protected]> wrote:
>
> > Same Q as Clay asked.  We can see..
> >
> > Also generically we can not consider like only one table in cluster.
> > At top level we give options like balance per table level or per
> > cluster level only.  This also should be considered for the new
> > balancer also IMO.  Ya if it can work with cost function change alone,
> > it will be much smaller change.  On high level am +1 for such a simple
> > way to handle the heterogeneous nodes cluster.
> >
> > Anoop
> >
> > On Fri, Jun 21, 2019 at 5:15 AM Clay Baenziger (BLOOMBERG/ 731 LEX) <
> > [email protected]> wrote:
> >
> > > Could it work to have the stochastic load balancer use pluggable
> > > cost functions[1]? Then, could this type of a load balancer be
> > > implemented simply as a new cost function which folks could choose
> > > to load and mix
> > with
> > > the others?
> > >
> > > -Clay
> > >
> > > [1]: Instead of this static list of cost functions?
> > >
> > https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818
> > 458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/bala
> > ncer/StochasticLoadBalancer.java#L198
> > >
> > > From: [email protected] At: 06/20/19 12:54:23To:
> > [email protected]
> > > Subject: Re: Adding a new balancer to HBase
> > >
> > > Bonjour Pierre,
> > >
> > > Some time ago I build (for my own purposes) something similar that I
> > called
> > > "LoadBasedLoadBalancer" that moves the regions based on my servers
> > > load
> > and
> > > capacity. The load balancer is querying the region servers to get
> > > the number of cores, the allocated heap, the 5 minutes average load,
> > > etc. and balanced the regions based on that.
> > >
> > > I felt that need already years ago. What you are proposing is a
> > simplified
> > > version that will most probably be more stable and easier to
> > > implement. I will be happy to assist you in the process or getting
> that into HBase.
> > >
> > > Have you already opened the JIRA to support that?
> > >
> > > Thanks,
> > >
> > > JMS
> > >
> > > Le jeu. 20 juin 2019 à 01:11, ramkrishna vasudevan <
> > > [email protected]> a écrit :
> > >
> > > > Seems a very good idea for cloud servers. Pls feel free to raise a
> > > > JIRA
> > > and
> > > > contribute your patch.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Tue, Jun 18, 2019 at 8:09 AM 刘新星 <[email protected]> wrote:
> > > >
> > > > >
> > > > >
> > > > > I'm interested on this. It sounds like a weighted load balancer
> > > > > and valuable for those users deploy their hbase cluster on cloud
> servers.
> > > > > You can create a jira and make a patch for better discussion.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2019-06-18 05:00:54, "Pierre Zemb"
> > > > > <[email protected]>
> > > > wrote:
> > > > > >Hi!
> > > > > >
> > > > > >My name is Pierre, I'm working at OVH, an European cloud-provider.
> > Our
> > > > > >team, Observability, is heavily relying on HBase to store
> telemetry.
> > > We
> > > > > >would like to open the discussion about adding into 1.4X and
> > > > > >2.X a
> > new
> > > > > >Balancer.
> > > > > ><
> > > > >
> > > >
> > >
> > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-s
> > ituation
> > > > > >Our
> > > > > >situation
> > > > > >
> > > > > >The Observability team in OVH is responsible to handle logs and
> > > metrics
> > > > > >from all servers/applications/equipments within OVH. HBase is
> > > > > >used
> > as
> > > > the
> > > > > >datastore for metrics. We are using an open-source software
> > > > > >called
> > > > Warp10
> > > > > ><https://warp10.io> to handle all the metrics coming from OVH's
> > > > > >infrastructure. We are operating three HBase 1.4 clusters,
> > > > > >including
> > > one
> > > > > >with 218 RegionServers which is growing every month.
> > > > > >
> > > > > >We found out that *in our usecase*(single table, dedicated
> > > > > >HBase and
> > > > > Hadoop
> > > > > >tuned for our usecase, good key distribution)*, the number of
> > regions
> > > > per
> > > > > >RS was the real limit for us*.
> > > > > >
> > > > > >Over the years, due to historical reasons and also the need to
> > > benchmark
> > > > > >new machines, we ended-up with differents groups of hardware:
> > > > > >some
> > > > servers
> > > > > >can handle only 180 regions, whereas the biggest can handle
> > > > > >more
> > than
> > > > 900.
> > > > > >Because of such a difference, we had to disable the
> > > > > >LoadBalancing to
> > > > avoid
> > > > > >the roundRobinAssigmnent. We developed some internal tooling
> > > > > >which
> > are
> > > > > >responsible for load balancing regions across RegionServers.
> > > > > >That
> > was
> > > > 1.5
> > > > > >year ago.
> > > > > >
> > > > > >Today, we are thinking about fully integrate it within HBase,
> > > > > >using
> > > the
> > > > > >LoadBalancer interface. We started working on a new Balancer
> > > > > >called HeterogeneousBalancer, that will be able to fullfill our
> need.
> > > > > ><
> > > > >
> > > >
> > >
> > >
> > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-d
> > oes-it-wor
> > > ks
> > > > > >How
> > > > > >does it works?
> > > > > >
> > > > > >A rule file is loaded before balancing. It contains lines of
> rules.
> > A
> > > > rule
> > > > > >is composed of a regexp for hostname, and a limit. For example,
> > > > > >we
> > > could
> > > > > >have:
> > > > > >
> > > > > >rs[0-9] 200
> > > > > >rs1[0-9] 50
> > > > > >
> > > > > >RegionServers with hostname matching the first rules will have
> > > > > >a
> > limit
> > > > of
> > > > > >200, and the others 50. If there's no match, a default is set.
> > > > > >
> > > > > >Thanks to the rule, we have two informations: the max number of
> > > regions
> > > > > for
> > > > > >this cluster, and the rules for each servers.
> > > > > >HeterogeneousBalancer
> > > will
> > > > > >try to balance regions according to their capacity.
> > > > > >
> > > > > >Let's take an example. Let's say that we have 20 RS:
> > > > > >
> > > > > >   - 10 RS, named through rs0 to rs9 loaded with 60 regions
> > > > > > each,
> > and
> > > > each
> > > > > >   can handle 200 regions.
> > > > > >   - 10 RS, named through rs10 to rs19 loaded with 60 regions
> > > > > > each,
> > > and
> > > > > >   each can support 50 regions.
> > > > > >
> > > > > >Based on the following rules:
> > > > > >
> > > > > >rs[0-9] 200
> > > > > >rs1[0-9] 50
> > > > > >
> > > > > >The second group is overloaded, whereas the first group has
> > > > > >plenty
> > of
> > > > > space.
> > > > > >
> > > > > >We know that we can handle at maximum *2500 regions* (200*10 +
> > 50*10)
> > > > and
> > > > > >we have currently *1200 regions* (60*20). HeterogeneousBalancer
> > > > > >will understand that the cluster is *full at 48.0%*
> > > > > >(1200/2500). Based on
> > > > this
> > > > > >information, we will then *try to put all the RegionServers to
> > > > > >~48%
> > of
> > > > > load
> > > > > >according to the rules.* In this case, it will move regions
> > > > > >from the
> > > > > second
> > > > > >group to the first.
> > > > > >
> > > > > >The balancer will:
> > > > > >
> > > > > >   - compute how many regions needs to be moved. In our
> > > > > > example, by
> > > > moving
> > > > > >   36 regions on rs10, we could go from 120.0% to 46.0%
> > > > > >   - select regions with lowest data-locality
> > > > > >   - try to find an appropriate RS for the region. We will take
> > > > > > the
> > > > lowest
> > > > > >   available RS.
> > > > > >
> > > > > ><
> > > > >
> > > >
> > >
> > >
> > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#curre
> > nt-status
> > > > > >Current
> > > > > >status
> > > > > >
> > > > > >We started the implementation, but it is not finished yet. we
> > > > > >are
> > > > planning
> > > > > >to deploy it on a cluster with lower impact for testing, and
> > > > > >then
> > put
> > > it
> > > > > on
> > > > > >our biggest cluster.
> > > > > >
> > > > > >We have some basic implementation of all methods, but we need
> > > > > >to add
> > > > more
> > > > > >tests and make the code more robust. You can find the
> > proof-of-concept
> > > > > here
> > > > > ><
> > > > >
> > > >
> > >
> > >
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/main
> > > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.
> > > java
> > > <
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousB
> > alancer.java
> > >
> > > > > >,
> > > > > >and some early tests here
> > > > > ><
> > > > >
> > > >
> > >
> > >
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/main
> > > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.
> > > java
> > > <
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousB
> > alancer.java
> > >
> > > > > >,
> > > > > >here
> > > > > ><
> > > > >
> > > >
> > >
> > >
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/test
> > >
> > >
> > /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalance
> > rBalance.j
> > > <
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogene
> > ousBalancerBalance.j
> > >
> > > ava
> > > > > >,
> > > > > >and here
> > > > > ><
> > > > >
> > > >
> > >
> > >
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/test
> > >
> > >
> > /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalance
> > rRules.jav
> > > <
> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
> > r/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogene
> > ousBalancerRules.jav
> > >
> > > a
> > > > > >.
> > > > > >We wrote the balancer for our use-case, which means that:
> > > > > >
> > > > > >   - there is one table
> > > > > >   - there is no region-replica
> > > > > >   - good key dispersion
> > > > > >   - there is no regions on master
> > > > > >
> > > > > >However, we believe that this will not be too complicated to
> > > implement.
> > > > We
> > > > > >are also thinking about the possibility to limit overassigments
> > > > > >of
> > > > regions
> > > > > >by moving them to the least loaded RS.
> > > > > >
> > > > > >Even if the balancing strategy seems simple, we do think that
> > > > > >having
> > > the
> > > > > >possibility to run HBase cluster on heterogeneous hardware is
> > > > > >vital, especially in cloud environment, because you may not be
> > > > > >able to buy
> > > the
> > > > > >same server specs throughout the years.
> > > > > >
> > > > > >What do you think about our approach? Are you interested for
> > > > > >such a contribution?
> > > > > >---
> > > > > >
> > > > > >Pierre ZEMB - OVH Group
> > > > > >Observability/Metrics - Infrastructure Engineer pierrezemb.fr
> > > > > >+33 7 86 95 61 65
> > > > >
> > > >
> > >
> > >
> > >
> >
>

Reply via email to