Re: How does HBase perform load balancing?

Ryan Rawson Sat, 08 May 2010 16:09:10 -0700

On Sat, May 8, 2010 at 4:47 AM, MauMau <maumau...@gmail.com> wrote:
> Thank you, Ryan
>
> I look forward to the future advancement in load balancing strategy ---
>  distributing "hot" regions among lightly loaded region servers based on
> request count.


Here at Stumbleupon we handle 12,000 requests/second, some
regionservers are a bit warmer than others, but it hasnt proven to be
a serious issue.

>
> Thanks for the note that HBase is not a key-value store. Yes, some people
> say like you, and others say that HBase is one of the key-value stores. I
> think they call HBase call HBase as a key-value store in the sense that
> users access data by keys (row key, column key).

I think the equivalent would be to use, for example, mysql but never
mention or use the join features. I don't want HBase to become
typecast as just a key-value store, when the things you can do in
HBase are substantially more.


>
>> the same ring you probably want to take action. Ironically nodes in
>> cassandra are special (unlike the publicly stated info) and they
>> "belong" to a particular ring segment and cannot be used to store
>> other data. There are tools to do node swap in, but you want your
>
> Do you mean "preference list" by "ring segment"? If all nodes in the
> preference list for particular data get down before another node takes over
> the data, the data cannot be read. But I thought writes can be accepted by
> the mechanism called hinted handoff.
> However, the probability that all nodes in the preference list crash is very
> low. That is the same for HBase; if all data nodes for particular data of
> HBase get down, that data cannot be accessed.
>
> The original concern is whether I can answer "yes" in the following
> situation:
>
> Customer: The system is slow. How can we improve the responsiveness?
> Me: The response time of the data store IS high, and all region servers are
> busy. You can improve the situation by adding some region servers.
> Customer: Just adding new servers is OK? Can we see the improvement soon?

If you want more head room you add servers. HBase will reassign
regions to those regionservers. You now have access to more CPU and
RAM and have a larger and more effective block cache. The data doesn't
get spread around, but you can initiate major compactions on some/all
of the tables which will move data around immediately.  There are no
concerns for growing a cluster in this way - I have done it to double
the size of a cluster and I saw immediate performance.  I major
compacted a table I was doing a map reduce on and I saw more
performance improvements.  In a live serving system you do NOT want to
be accessing disk most the time - caching is the name of the game for
reducing latency.  Everyone does this (you think your google results
are read from disk?) and it's a fairly uniform "law" of doing low
latency services - RAM is king. And when you expand a HBase cluster
you get more effective ram immediately - no rebalancing required
(unlike DHT-based architectures).



>
> Maumau
>
>
> ----- Original Message ----- From: "Ryan Rawson" <ryano...@gmail.com>
> To: <hbase-user@hadoop.apache.org>
> Sent: Saturday, May 08, 2010 6:30 PM
> Subject: Re: How does HBase perform load balancing?
>
>
> hey,
>
> HBase currently uses region count to load balance.  Regions are
> assigned in a semi-randomish order to other regionservers.
>
> The paper is somewhat correct in that we are not moving data around
> aggressively, because then people would write in complaining we move
> data around too much :-)
>
> So a few notes, HBase is not a key-value store, its a tabluar data
> store, which maintains key order, and allows the easy construction of
> left-match key indexes.
>
> One other thing... if you are using a DHT (eg: cassandra), when a node
> fails the load moves to the other servers in the ring-segment. For
> example if you have N=3 and you lose a node in a segment, the load of
> a server would move to 2 other servers.  Your monitoring system should
> probably be tied into the DHT topology since if a second node fails in
> the same ring you probably want to take action. Ironically nodes in
> cassandra are special (unlike the publicly stated info) and they
> "belong" to a particular ring segment and cannot be used to store
> other data. There are tools to do node swap in, but you want your
> cluster management to be as automated as possible.
>
> Compared to a bigtable architecture, the load of a failed regionserver
> is evenly spread across the entire rest of the cluster.  No node has a
> special role in HDFS and HBase, any data can be hosted and served from
> any node.  As nodes fail, as long as you have enough nodes to serve
> the load you are in good shape. The HDFS missing block report lets you
> know when you have lost too many nodes. Nodes have no special role and
> can host and hold any data.
>
> In the future we want to add a load balancing based on
> requests/second.  We have all the requisite data and architecture, but
> other things are up more important right now.  Pure region count load
> balancing tends to work fairly well in practice.
>
>
>

Re: How does HBase perform load balancing?

Reply via email to