On Sat, May 8, 2010 at 5:50 PM, MauMau <maumau...@gmail.com> wrote:
> Hi, Ryan
>
> From: "Ryan Rawson" <ryano...@gmail.com>
>>
>> Here at Stumbleupon we handle 12,000 requests/second, some
>> regionservers are a bit warmer than others, but it hasnt proven to be
>> a serious issue.
>
> Thank you for sharing your precious experience. "12,000 requests/second" is
> great!
>
>
>> If you want more head room you add servers. HBase will reassign
>> regions to those regionservers. You now have access to more CPU and
>> RAM and have a larger and more effective block cache. The data doesn't
>> get spread around, but you can initiate major compactions on some/all
>> of the tables which will move data around immediately.  There are no
>> concerns for growing a cluster in this way - I have done it to double
>> the size of a cluster and I saw immediate performance.  I major
>> compacted a table I was doing a map reduce on and I saw more
>> performance improvements.  In a live serving system you do NOT want to
>> be accessing disk most the time - caching is the name of the game for
>> reducing latency.  Everyone does this (you think your google results
>> are read from disk?) and it's a fairly uniform "law" of doing low
>> latency services - RAM is king. And when you expand a HBase cluster
>> you get more effective ram immediately - no rebalancing required
>> (unlike DHT-based architectures).
>
> What I understood from the above is as follows. I'd appreciate if you could
> point out if I am wrong.
>
> 1. I need to perform major-compaction to unassign regions from the existing
> loaded region servers to a new region server.

This is not so - regions are automatically reassigned with no
compaction necessary.

> I cannot reassign the regions just by doing minor compaction and letting the
> non-loaded new server perform major compaction later. Having the loaded
> existing server do heavy major compaction is a concern.

this is not what happens, regions are reassigned without requiring any
compaction of any kind.

> 2. "no rebalancing required" means that the blocks of HDFS files for regions
> need not be moved from one datanode to another.

So when you add nodes to a new cluster, unless you are running the
HDFS balancer, data does not migrate.  As HBase naturally compacts
tables (once a day by default) it will end up rewriting data and
causing its migration. You can help accelerate this process by
manually kicking off a compaction for a large table if you have added
a lot of new machines.

-ryan


>
> Thank you.
> Maumau
>
>

Reply via email to