How does HBase perform load balancing?

MauMau Sat, 08 May 2010 02:15:51 -0700

Hello,

I got the following error when I sent the mail.


Technical details of permanent failure:

Google tried to deliver your message, but it was rejected by the recipientdomain. We recommend contacting the other email provider for furtherinformation about the cause of this error. The error that the other serverreturned was: 552 552 spam score (5.2) exceeded threshold (state 18).

The original mail might have been too long, so let me split it and sendagain.

I'm comparing HBase and Cassandra, which I think are the most promisingdistributed key-value stores, to determine which one to choose for thefuture OLTP and data analysis.I found the following benchmark report by Yahoo! Research which evalutesHBase, Cassandra, PNUTS, and sharded MySQL.


http://wiki.apache.org/hadoop/Hbase/DesignOverview

The above report refers to HBase 0.20.3.

Reading this and HBase's documentation, two questions about load balancingand replication have risen. Could anyone give me any information to helpsolve these questions?


[Q1] Load balancing

Does HBase move regions to a newly added region server (logically, notphysically on storage) immediately? If not immediately, what timing?On what criteria does the master unassign and assign regions among regionservers? CPU load, read/write request rates, or just the number of regionsthe region servers are handling?

According the HBase design overview on the page below, the master monitorsthe load of each region server and moves regions.


http://wiki.apache.org/hadoop/Hbase/DesignOverview

The related part is the following:

----------------------------------------
HMaster duties:

Assigning/unassigning regions to/from HRegionServers (unassigning is forload balance)

Monitor the health and load of each HRegionServer
...

If HMaster detects overloaded or low loaded H!RegionServer, it will unassign(close) some regions from most loaded H!RegionServer. Unassigned regionswill be assigned to low loaded servers.

----------------------------------------

When I read the above, I thought that the master checks the load of regionservers periodically (once a few minutes or so) and performs load balancing.And I thought that the master unassigns some regions from the existingloaded region servers to a newly added one immediately when the new serverjoins the cluster and contacts the master.However, the benchmark report by Yahoo! Research describes as follows. Thissays that HBase does not move regions until compaction, so I cannot get theeffect of adding new servers immediately even if I added the new server tosolve the overload problem.

What's the fact?

----------------------------------------
6.7 Elastic Speedup
As the figure shows, the read latency spikes initially after
the sixth server is added, before the latency stabilizes at a
value slightly lower than the latency for five servers. This result
indicates that HBase is able to shift read and write load
to the new server, resulting in lower latency. HBase does
not move existing data to the new server until compactions
occur2. The result is less latency variance compared to Cassandra
since there is no repartitioning process competing
for the disk. However, the new server is underutilized, since
existing data is served off the old servers.
...
2 It is possible to run the HDFS load balancer to force data
to the new servers, but this greatly disrupts HBase’s ability
to serve data partitions from the same servers on which they
are stored.
----------------------------------------

MauMau

How does HBase perform load balancing?

Reply via email to