Thanks, Ryan,
- To utilize the CPU and memory of new region servers:
I do not have to do anything.
The master automatically reassign existing regions to the new servers.
(I'll search the docs for how soon the master performs reassignment when
adding new servers)
- To utilize the storage space and I/O capacity of new region servers:
Choose or combine the following:
1. Automatic major compaction (once a day)
2. Perform major compaction explicitly
3. Use HDFS balancer
Maumau
----- Original Message -----
From: "Ryan Rawson" <ryano...@gmail.com>
To: <hbase-user@hadoop.apache.org>
Sent: Sunday, May 09, 2010 10:08 AM
Subject: Re: How does HBase perform load balancing?
What I understood from the above is as follows. I'd appreciate if you
could
point out if I am wrong.
1. I need to perform major-compaction to unassign regions from the
existing
loaded region servers to a new region server.
This is not so - regions are automatically reassigned with no
compaction necessary.
I cannot reassign the regions just by doing minor compaction and letting
the
non-loaded new server perform major compaction later. Having the loaded
existing server do heavy major compaction is a concern.
this is not what happens, regions are reassigned without requiring any
compaction of any kind.
2. "no rebalancing required" means that the blocks of HDFS files for
regions
need not be moved from one datanode to another.
So when you add nodes to a new cluster, unless you are running the
HDFS balancer, data does not migrate. As HBase naturally compacts
tables (once a day by default) it will end up rewriting data and
causing its migration. You can help accelerate this process by
manually kicking off a compaction for a large table if you have added
a lot of new machines.