Hi!

Just a quick hands-up:

   - I added the possibility to load a CostFunction
   
https://github.com/PierreZ/hbase/commit/9ddf356ee12f0b39ee0d33211834a718f0dd6194
   It can only load a single function for now
   - I reimplemented my balancer as a cost function. Moving from a full
   balancer to a single cost function was a huge benefit for us, as we just
   need to implement
   
https://github.com/PierreZ/hbase/commit/ebe2a1501dda4deb150308a3b380de3bef5961ee#diff-53043f78e2be40cfbf3ff4344bb30bd0R69

I will now backport my tests, and also add the possibility to load multiple
cost functions.
---

Pierre ZEMB - OVH Group
Observability/Metrics - Infrastructure Engineer
pierrezemb.fr
+33 7 86 95 61 65


Le sam. 22 juin 2019 à 19:26, Pierre Zemb <[email protected]> a
écrit :

> Hi everyone!
>
> Wow, thanks a lot for all the replies! I love Clay's idea as it will be
> really cool to mix user-defined and mainline stochastic functions.
>
> I finally created the JIRA:
> https://issues.apache.org/jira/browse/HBASE-22618
>
> Let's make Heterogeneous deployments a reality!
> ---
>
> Pierre ZEMB - OVH Group
> Observability/Metrics - Infrastructure Engineer
> pierrezemb.fr
> +33 7 86 95 61 65
>
>
> Le ven. 21 juin 2019 à 11:29, bhupendra jain <[email protected]>
> a écrit :
>
>> Its good idea for Heterogeneous deployment.
>>
>> Internally, we are also having discussion to develop similar solution. In
>> our approach, We were also thinking of adding "RS Label" Feature similar to
>> Hadoop Node Label feature.
>> Each RS can have a label to denote its capabilities / resources . When
>> user create table, there can be extra attributes with its descriptor. The
>> balancer can decide to host region of table based on RS label and these
>> attributes further.
>> With RS label feature, Balancer can be more intelligent.  Example tables
>> with high read load needs more cache backed by SSDs , So such table regions
>> should be hosted on RS having SSDs ...
>>
>> I think , Your idea is simple and can get in fast as a first step for
>> Heterogeneous cluster.
>>
>>
>> Regards
>> Bhupendra Kumar Jain
>> Lead Architect , Enterprise Intelligence, IT&Cloud BU
>>
>> Huawei Technologies India Pvt. Ltd.
>> Survey No. 37, Next to EPIP Area, Kundalahalli, Whitefield
>> Bengaluru-560066, Karnataka
>> Tel: + 91-80-49160700 Ext 71024 II Mob: 9886164367 Email:
>> [email protected]
>>
>>
>>
>>
>>
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which
>> is intended only for the person or entity whose address is listed above.
>> Any use of the
>> information contained herein in any way (including, but not limited to,
>> total or partial
>> disclosure, reproduction, or dissemination) by persons other than the
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, please
>> notify the sender by
>> phone or email immediately and delete it!
>>
>>
>> -----Original Message-----
>> From: ramkrishna vasudevan [mailto:[email protected]]
>> Sent: Friday, June 21, 2019 10:31 AM
>> To: dev <[email protected]>
>> Cc: Clay Baenziger <[email protected]>
>> Subject: Re: Adding a new balancer to HBase
>>
>> Seems Clay's idea is also a very good idea. that would also make impl
>> much simpler and the focus would only be on cost functions.
>>
>> Regards
>> Ram
>>
>> On Fri, Jun 21, 2019 at 10:21 AM Anoop John <[email protected]>
>> wrote:
>>
>> > Same Q as Clay asked.  We can see..
>> >
>> > Also generically we can not consider like only one table in cluster.
>> > At top level we give options like balance per table level or per
>> > cluster level only.  This also should be considered for the new
>> > balancer also IMO.  Ya if it can work with cost function change alone,
>> > it will be much smaller change.  On high level am +1 for such a simple
>> > way to handle the heterogeneous nodes cluster.
>> >
>> > Anoop
>> >
>> > On Fri, Jun 21, 2019 at 5:15 AM Clay Baenziger (BLOOMBERG/ 731 LEX) <
>> > [email protected]> wrote:
>> >
>> > > Could it work to have the stochastic load balancer use pluggable
>> > > cost functions[1]? Then, could this type of a load balancer be
>> > > implemented simply as a new cost function which folks could choose
>> > > to load and mix
>> > with
>> > > the others?
>> > >
>> > > -Clay
>> > >
>> > > [1]: Instead of this static list of cost functions?
>> > >
>> > https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818
>> > 458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/bala
>> > ncer/StochasticLoadBalancer.java#L198
>> > >
>> > > From: [email protected] At: 06/20/19 12:54:23To:
>> > [email protected]
>> > > Subject: Re: Adding a new balancer to HBase
>> > >
>> > > Bonjour Pierre,
>> > >
>> > > Some time ago I build (for my own purposes) something similar that I
>> > called
>> > > "LoadBasedLoadBalancer" that moves the regions based on my servers
>> > > load
>> > and
>> > > capacity. The load balancer is querying the region servers to get
>> > > the number of cores, the allocated heap, the 5 minutes average load,
>> > > etc. and balanced the regions based on that.
>> > >
>> > > I felt that need already years ago. What you are proposing is a
>> > simplified
>> > > version that will most probably be more stable and easier to
>> > > implement. I will be happy to assist you in the process or getting
>> that into HBase.
>> > >
>> > > Have you already opened the JIRA to support that?
>> > >
>> > > Thanks,
>> > >
>> > > JMS
>> > >
>> > > Le jeu. 20 juin 2019 à 01:11, ramkrishna vasudevan <
>> > > [email protected]> a écrit :
>> > >
>> > > > Seems a very good idea for cloud servers. Pls feel free to raise a
>> > > > JIRA
>> > > and
>> > > > contribute your patch.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > > On Tue, Jun 18, 2019 at 8:09 AM 刘新星 <[email protected]>
>> wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > I'm interested on this. It sounds like a weighted load balancer
>> > > > > and valuable for those users deploy their hbase cluster on cloud
>> servers.
>> > > > > You can create a jira and make a patch for better discussion.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > At 2019-06-18 05:00:54, "Pierre Zemb"
>> > > > > <[email protected]>
>> > > > wrote:
>> > > > > >Hi!
>> > > > > >
>> > > > > >My name is Pierre, I'm working at OVH, an European
>> cloud-provider.
>> > Our
>> > > > > >team, Observability, is heavily relying on HBase to store
>> telemetry.
>> > > We
>> > > > > >would like to open the discussion about adding into 1.4X and
>> > > > > >2.X a
>> > new
>> > > > > >Balancer.
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-s
>> > ituation
>> > > > > >Our
>> > > > > >situation
>> > > > > >
>> > > > > >The Observability team in OVH is responsible to handle logs and
>> > > metrics
>> > > > > >from all servers/applications/equipments within OVH. HBase is
>> > > > > >used
>> > as
>> > > > the
>> > > > > >datastore for metrics. We are using an open-source software
>> > > > > >called
>> > > > Warp10
>> > > > > ><https://warp10.io> to handle all the metrics coming from OVH's
>> > > > > >infrastructure. We are operating three HBase 1.4 clusters,
>> > > > > >including
>> > > one
>> > > > > >with 218 RegionServers which is growing every month.
>> > > > > >
>> > > > > >We found out that *in our usecase*(single table, dedicated
>> > > > > >HBase and
>> > > > > Hadoop
>> > > > > >tuned for our usecase, good key distribution)*, the number of
>> > regions
>> > > > per
>> > > > > >RS was the real limit for us*.
>> > > > > >
>> > > > > >Over the years, due to historical reasons and also the need to
>> > > benchmark
>> > > > > >new machines, we ended-up with differents groups of hardware:
>> > > > > >some
>> > > > servers
>> > > > > >can handle only 180 regions, whereas the biggest can handle
>> > > > > >more
>> > than
>> > > > 900.
>> > > > > >Because of such a difference, we had to disable the
>> > > > > >LoadBalancing to
>> > > > avoid
>> > > > > >the roundRobinAssigmnent. We developed some internal tooling
>> > > > > >which
>> > are
>> > > > > >responsible for load balancing regions across RegionServers.
>> > > > > >That
>> > was
>> > > > 1.5
>> > > > > >year ago.
>> > > > > >
>> > > > > >Today, we are thinking about fully integrate it within HBase,
>> > > > > >using
>> > > the
>> > > > > >LoadBalancer interface. We started working on a new Balancer
>> > > > > >called HeterogeneousBalancer, that will be able to fullfill our
>> need.
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > >
>> > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-d
>> > oes-it-wor
>> > > ks
>> > > > > >How
>> > > > > >does it works?
>> > > > > >
>> > > > > >A rule file is loaded before balancing. It contains lines of
>> rules.
>> > A
>> > > > rule
>> > > > > >is composed of a regexp for hostname, and a limit. For example,
>> > > > > >we
>> > > could
>> > > > > >have:
>> > > > > >
>> > > > > >rs[0-9] 200
>> > > > > >rs1[0-9] 50
>> > > > > >
>> > > > > >RegionServers with hostname matching the first rules will have
>> > > > > >a
>> > limit
>> > > > of
>> > > > > >200, and the others 50. If there's no match, a default is set.
>> > > > > >
>> > > > > >Thanks to the rule, we have two informations: the max number of
>> > > regions
>> > > > > for
>> > > > > >this cluster, and the rules for each servers.
>> > > > > >HeterogeneousBalancer
>> > > will
>> > > > > >try to balance regions according to their capacity.
>> > > > > >
>> > > > > >Let's take an example. Let's say that we have 20 RS:
>> > > > > >
>> > > > > >   - 10 RS, named through rs0 to rs9 loaded with 60 regions
>> > > > > > each,
>> > and
>> > > > each
>> > > > > >   can handle 200 regions.
>> > > > > >   - 10 RS, named through rs10 to rs19 loaded with 60 regions
>> > > > > > each,
>> > > and
>> > > > > >   each can support 50 regions.
>> > > > > >
>> > > > > >Based on the following rules:
>> > > > > >
>> > > > > >rs[0-9] 200
>> > > > > >rs1[0-9] 50
>> > > > > >
>> > > > > >The second group is overloaded, whereas the first group has
>> > > > > >plenty
>> > of
>> > > > > space.
>> > > > > >
>> > > > > >We know that we can handle at maximum *2500 regions* (200*10 +
>> > 50*10)
>> > > > and
>> > > > > >we have currently *1200 regions* (60*20). HeterogeneousBalancer
>> > > > > >will understand that the cluster is *full at 48.0%*
>> > > > > >(1200/2500). Based on
>> > > > this
>> > > > > >information, we will then *try to put all the RegionServers to
>> > > > > >~48%
>> > of
>> > > > > load
>> > > > > >according to the rules.* In this case, it will move regions
>> > > > > >from the
>> > > > > second
>> > > > > >group to the first.
>> > > > > >
>> > > > > >The balancer will:
>> > > > > >
>> > > > > >   - compute how many regions needs to be moved. In our
>> > > > > > example, by
>> > > > moving
>> > > > > >   36 regions on rs10, we could go from 120.0% to 46.0%
>> > > > > >   - select regions with lowest data-locality
>> > > > > >   - try to find an appropriate RS for the region. We will take
>> > > > > > the
>> > > > lowest
>> > > > > >   available RS.
>> > > > > >
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > >
>> > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#curre
>> > nt-status
>> > > > > >Current
>> > > > > >status
>> > > > > >
>> > > > > >We started the implementation, but it is not finished yet. we
>> > > > > >are
>> > > > planning
>> > > > > >to deploy it on a cluster with lower impact for testing, and
>> > > > > >then
>> > put
>> > > it
>> > > > > on
>> > > > > >our biggest cluster.
>> > > > > >
>> > > > > >We have some basic implementation of all methods, but we need
>> > > > > >to add
>> > > > more
>> > > > > >tests and make the code more robust. You can find the
>> > proof-of-concept
>> > > > > here
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > >
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/main
>> > > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.
>> > > java
>> > > <
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousB
>> > alancer.java
>> > >
>> > > > > >,
>> > > > > >and some early tests here
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > >
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/main
>> > > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.
>> > > java
>> > > <
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousB
>> > alancer.java
>> > >
>> > > > > >,
>> > > > > >here
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > >
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/test
>> > >
>> > >
>> > /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalance
>> > rBalance.j
>> > > <
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogene
>> > ousBalancerBalance.j
>> > >
>> > > ava
>> > > > > >,
>> > > > > >and here
>> > > > > ><
>> > > > >
>> > > >
>> > >
>> > >
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/test
>> > >
>> > >
>> > /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalance
>> > rRules.jav
>> > > <
>> > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve
>> > r/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogene
>> > ousBalancerRules.jav
>> > >
>> > > a
>> > > > > >.
>> > > > > >We wrote the balancer for our use-case, which means that:
>> > > > > >
>> > > > > >   - there is one table
>> > > > > >   - there is no region-replica
>> > > > > >   - good key dispersion
>> > > > > >   - there is no regions on master
>> > > > > >
>> > > > > >However, we believe that this will not be too complicated to
>> > > implement.
>> > > > We
>> > > > > >are also thinking about the possibility to limit overassigments
>> > > > > >of
>> > > > regions
>> > > > > >by moving them to the least loaded RS.
>> > > > > >
>> > > > > >Even if the balancing strategy seems simple, we do think that
>> > > > > >having
>> > > the
>> > > > > >possibility to run HBase cluster on heterogeneous hardware is
>> > > > > >vital, especially in cloud environment, because you may not be
>> > > > > >able to buy
>> > > the
>> > > > > >same server specs throughout the years.
>> > > > > >
>> > > > > >What do you think about our approach? Are you interested for
>> > > > > >such a contribution?
>> > > > > >---
>> > > > > >
>> > > > > >Pierre ZEMB - OVH Group
>> > > > > >Observability/Metrics - Infrastructure Engineer pierrezemb.fr
>> > > > > >+33 7 86 95 61 65
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> >
>>
>

Reply via email to