I dont think the docs say anything about how the balancer works, but it is immediate. As soon as a Regionserver makes the 'report for duty' call back to the master the master will begin reassigning regions. It can take a few for regions to flush then close and move, but it is done fairly efficiently and quickly.
The other points are quite accurate - thanks for the summary. If you are interested in writing blog articles or documentation, we would greatly appreciate it, and can link/embed it on our site/docs. There are only so many hours in the day alas. Thanks again, -ryan On Sat, May 8, 2010 at 6:34 PM, MauMau <maumau...@gmail.com> wrote: > Thanks, Ryan, > > - To utilize the CPU and memory of new region servers: > I do not have to do anything. > The master automatically reassign existing regions to the new servers. > (I'll search the docs for how soon the master performs reassignment when > adding new servers) > - To utilize the storage space and I/O capacity of new region servers: > Choose or combine the following: > 1. Automatic major compaction (once a day) > 2. Perform major compaction explicitly > 3. Use HDFS balancer > > Maumau > > ----- Original Message ----- From: "Ryan Rawson" <ryano...@gmail.com> > To: <hbase-user@hadoop.apache.org> > Sent: Sunday, May 09, 2010 10:08 AM > Subject: Re: How does HBase perform load balancing? > >> What I understood from the above is as follows. I'd appreciate if you >> could >> point out if I am wrong. >> >> 1. I need to perform major-compaction to unassign regions from the >> existing >> loaded region servers to a new region server. > > This is not so - regions are automatically reassigned with no > compaction necessary. > >> I cannot reassign the regions just by doing minor compaction and letting >> the >> non-loaded new server perform major compaction later. Having the loaded >> existing server do heavy major compaction is a concern. > > this is not what happens, regions are reassigned without requiring any > compaction of any kind. > >> 2. "no rebalancing required" means that the blocks of HDFS files for >> regions >> need not be moved from one datanode to another. > > So when you add nodes to a new cluster, unless you are running the > HDFS balancer, data does not migrate. As HBase naturally compacts > tables (once a day by default) it will end up rewriting data and > causing its migration. You can help accelerate this process by > manually kicking off a compaction for a large table if you have added > a lot of new machines. > >