Hi everyone! Wow, thanks a lot for all the replies! I love Clay's idea as it will be really cool to mix user-defined and mainline stochastic functions.
I finally created the JIRA: https://issues.apache.org/jira/browse/HBASE-22618 Let's make Heterogeneous deployments a reality! --- Pierre ZEMB - OVH Group Observability/Metrics - Infrastructure Engineer pierrezemb.fr +33 7 86 95 61 65 Le ven. 21 juin 2019 à 11:29, bhupendra jain <[email protected]> a écrit : > Its good idea for Heterogeneous deployment. > > Internally, we are also having discussion to develop similar solution. In > our approach, We were also thinking of adding "RS Label" Feature similar to > Hadoop Node Label feature. > Each RS can have a label to denote its capabilities / resources . When > user create table, there can be extra attributes with its descriptor. The > balancer can decide to host region of table based on RS label and these > attributes further. > With RS label feature, Balancer can be more intelligent. Example tables > with high read load needs more cache backed by SSDs , So such table regions > should be hosted on RS having SSDs ... > > I think , Your idea is simple and can get in fast as a first step for > Heterogeneous cluster. > > > Regards > Bhupendra Kumar Jain > Lead Architect , Enterprise Intelligence, IT&Cloud BU > > Huawei Technologies India Pvt. Ltd. > Survey No. 37, Next to EPIP Area, Kundalahalli, Whitefield > Bengaluru-560066, Karnataka > Tel: + 91-80-49160700 Ext 71024 II Mob: 9886164367 Email: > [email protected] > > > > > > This e-mail and its attachments contain confidential information from > HUAWEI, which > is intended only for the person or entity whose address is listed above. > Any use of the > information contained herein in any way (including, but not limited to, > total or partial > disclosure, reproduction, or dissemination) by persons other than the > intended > recipient(s) is prohibited. If you receive this e-mail in error, please > notify the sender by > phone or email immediately and delete it! > > > -----Original Message----- > From: ramkrishna vasudevan [mailto:[email protected]] > Sent: Friday, June 21, 2019 10:31 AM > To: dev <[email protected]> > Cc: Clay Baenziger <[email protected]> > Subject: Re: Adding a new balancer to HBase > > Seems Clay's idea is also a very good idea. that would also make impl much > simpler and the focus would only be on cost functions. > > Regards > Ram > > On Fri, Jun 21, 2019 at 10:21 AM Anoop John <[email protected]> wrote: > > > Same Q as Clay asked. We can see.. > > > > Also generically we can not consider like only one table in cluster. > > At top level we give options like balance per table level or per > > cluster level only. This also should be considered for the new > > balancer also IMO. Ya if it can work with cost function change alone, > > it will be much smaller change. On high level am +1 for such a simple > > way to handle the heterogeneous nodes cluster. > > > > Anoop > > > > On Fri, Jun 21, 2019 at 5:15 AM Clay Baenziger (BLOOMBERG/ 731 LEX) < > > [email protected]> wrote: > > > > > Could it work to have the stochastic load balancer use pluggable > > > cost functions[1]? Then, could this type of a load balancer be > > > implemented simply as a new cost function which folks could choose > > > to load and mix > > with > > > the others? > > > > > > -Clay > > > > > > [1]: Instead of this static list of cost functions? > > > > > https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818 > > 458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/bala > > ncer/StochasticLoadBalancer.java#L198 > > > > > > From: [email protected] At: 06/20/19 12:54:23To: > > [email protected] > > > Subject: Re: Adding a new balancer to HBase > > > > > > Bonjour Pierre, > > > > > > Some time ago I build (for my own purposes) something similar that I > > called > > > "LoadBasedLoadBalancer" that moves the regions based on my servers > > > load > > and > > > capacity. The load balancer is querying the region servers to get > > > the number of cores, the allocated heap, the 5 minutes average load, > > > etc. and balanced the regions based on that. > > > > > > I felt that need already years ago. What you are proposing is a > > simplified > > > version that will most probably be more stable and easier to > > > implement. I will be happy to assist you in the process or getting > that into HBase. > > > > > > Have you already opened the JIRA to support that? > > > > > > Thanks, > > > > > > JMS > > > > > > Le jeu. 20 juin 2019 à 01:11, ramkrishna vasudevan < > > > [email protected]> a écrit : > > > > > > > Seems a very good idea for cloud servers. Pls feel free to raise a > > > > JIRA > > > and > > > > contribute your patch. > > > > > > > > Regards > > > > Ram > > > > > > > > On Tue, Jun 18, 2019 at 8:09 AM 刘新星 <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > I'm interested on this. It sounds like a weighted load balancer > > > > > and valuable for those users deploy their hbase cluster on cloud > servers. > > > > > You can create a jira and make a patch for better discussion. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2019-06-18 05:00:54, "Pierre Zemb" > > > > > <[email protected]> > > > > wrote: > > > > > >Hi! > > > > > > > > > > > >My name is Pierre, I'm working at OVH, an European cloud-provider. > > Our > > > > > >team, Observability, is heavily relying on HBase to store > telemetry. > > > We > > > > > >would like to open the discussion about adding into 1.4X and > > > > > >2.X a > > new > > > > > >Balancer. > > > > > >< > > > > > > > > > > > > > > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-s > > ituation > > > > > >Our > > > > > >situation > > > > > > > > > > > >The Observability team in OVH is responsible to handle logs and > > > metrics > > > > > >from all servers/applications/equipments within OVH. HBase is > > > > > >used > > as > > > > the > > > > > >datastore for metrics. We are using an open-source software > > > > > >called > > > > Warp10 > > > > > ><https://warp10.io> to handle all the metrics coming from OVH's > > > > > >infrastructure. We are operating three HBase 1.4 clusters, > > > > > >including > > > one > > > > > >with 218 RegionServers which is growing every month. > > > > > > > > > > > >We found out that *in our usecase*(single table, dedicated > > > > > >HBase and > > > > > Hadoop > > > > > >tuned for our usecase, good key distribution)*, the number of > > regions > > > > per > > > > > >RS was the real limit for us*. > > > > > > > > > > > >Over the years, due to historical reasons and also the need to > > > benchmark > > > > > >new machines, we ended-up with differents groups of hardware: > > > > > >some > > > > servers > > > > > >can handle only 180 regions, whereas the biggest can handle > > > > > >more > > than > > > > 900. > > > > > >Because of such a difference, we had to disable the > > > > > >LoadBalancing to > > > > avoid > > > > > >the roundRobinAssigmnent. We developed some internal tooling > > > > > >which > > are > > > > > >responsible for load balancing regions across RegionServers. > > > > > >That > > was > > > > 1.5 > > > > > >year ago. > > > > > > > > > > > >Today, we are thinking about fully integrate it within HBase, > > > > > >using > > > the > > > > > >LoadBalancer interface. We started working on a new Balancer > > > > > >called HeterogeneousBalancer, that will be able to fullfill our > need. > > > > > >< > > > > > > > > > > > > > > > > > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-d > > oes-it-wor > > > ks > > > > > >How > > > > > >does it works? > > > > > > > > > > > >A rule file is loaded before balancing. It contains lines of > rules. > > A > > > > rule > > > > > >is composed of a regexp for hostname, and a limit. For example, > > > > > >we > > > could > > > > > >have: > > > > > > > > > > > >rs[0-9] 200 > > > > > >rs1[0-9] 50 > > > > > > > > > > > >RegionServers with hostname matching the first rules will have > > > > > >a > > limit > > > > of > > > > > >200, and the others 50. If there's no match, a default is set. > > > > > > > > > > > >Thanks to the rule, we have two informations: the max number of > > > regions > > > > > for > > > > > >this cluster, and the rules for each servers. > > > > > >HeterogeneousBalancer > > > will > > > > > >try to balance regions according to their capacity. > > > > > > > > > > > >Let's take an example. Let's say that we have 20 RS: > > > > > > > > > > > > - 10 RS, named through rs0 to rs9 loaded with 60 regions > > > > > > each, > > and > > > > each > > > > > > can handle 200 regions. > > > > > > - 10 RS, named through rs10 to rs19 loaded with 60 regions > > > > > > each, > > > and > > > > > > each can support 50 regions. > > > > > > > > > > > >Based on the following rules: > > > > > > > > > > > >rs[0-9] 200 > > > > > >rs1[0-9] 50 > > > > > > > > > > > >The second group is overloaded, whereas the first group has > > > > > >plenty > > of > > > > > space. > > > > > > > > > > > >We know that we can handle at maximum *2500 regions* (200*10 + > > 50*10) > > > > and > > > > > >we have currently *1200 regions* (60*20). HeterogeneousBalancer > > > > > >will understand that the cluster is *full at 48.0%* > > > > > >(1200/2500). Based on > > > > this > > > > > >information, we will then *try to put all the RegionServers to > > > > > >~48% > > of > > > > > load > > > > > >according to the rules.* In this case, it will move regions > > > > > >from the > > > > > second > > > > > >group to the first. > > > > > > > > > > > >The balancer will: > > > > > > > > > > > > - compute how many regions needs to be moved. In our > > > > > > example, by > > > > moving > > > > > > 36 regions on rs10, we could go from 120.0% to 46.0% > > > > > > - select regions with lowest data-locality > > > > > > - try to find an appropriate RS for the region. We will take > > > > > > the > > > > lowest > > > > > > available RS. > > > > > > > > > > > >< > > > > > > > > > > > > > > > > > https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#curre > > nt-status > > > > > >Current > > > > > >status > > > > > > > > > > > >We started the implementation, but it is not finished yet. we > > > > > >are > > > > planning > > > > > >to deploy it on a cluster with lower impact for testing, and > > > > > >then > > put > > > it > > > > > on > > > > > >our biggest cluster. > > > > > > > > > > > >We have some basic implementation of all methods, but we need > > > > > >to add > > > > more > > > > > >tests and make the code more robust. You can find the > > proof-of-concept > > > > > here > > > > > >< > > > > > > > > > > > > > > > > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/main > > > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer. > > > java > > > < > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousB > > alancer.java > > > > > > > > >, > > > > > >and some early tests here > > > > > >< > > > > > > > > > > > > > > > > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/main > > > /java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer. > > > java > > > < > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousB > > alancer.java > > > > > > > > >, > > > > > >here > > > > > >< > > > > > > > > > > > > > > > > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/test > > > > > > > > /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalance > > rBalance.j > > > < > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogene > > ousBalancerBalance.j > > > > > > ava > > > > > >, > > > > > >and here > > > > > >< > > > > > > > > > > > > > > > > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/test > > > > > > > > /java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalance > > rRules.jav > > > < > > https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-serve > > r/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogene > > ousBalancerRules.jav > > > > > > a > > > > > >. > > > > > >We wrote the balancer for our use-case, which means that: > > > > > > > > > > > > - there is one table > > > > > > - there is no region-replica > > > > > > - good key dispersion > > > > > > - there is no regions on master > > > > > > > > > > > >However, we believe that this will not be too complicated to > > > implement. > > > > We > > > > > >are also thinking about the possibility to limit overassigments > > > > > >of > > > > regions > > > > > >by moving them to the least loaded RS. > > > > > > > > > > > >Even if the balancing strategy seems simple, we do think that > > > > > >having > > > the > > > > > >possibility to run HBase cluster on heterogeneous hardware is > > > > > >vital, especially in cloud environment, because you may not be > > > > > >able to buy > > > the > > > > > >same server specs throughout the years. > > > > > > > > > > > >What do you think about our approach? Are you interested for > > > > > >such a contribution? > > > > > >--- > > > > > > > > > > > >Pierre ZEMB - OVH Group > > > > > >Observability/Metrics - Infrastructure Engineer pierrezemb.fr > > > > > >+33 7 86 95 61 65 > > > > > > > > > > > > > > > > > > > > >
