Hi, Ryan

From: "Ryan Rawson" <ryano...@gmail.com>
Here at Stumbleupon we handle 12,000 requests/second, some
regionservers are a bit warmer than others, but it hasnt proven to be
a serious issue.

Thank you for sharing your precious experience. "12,000 requests/second" is great!


If you want more head room you add servers. HBase will reassign
regions to those regionservers. You now have access to more CPU and
RAM and have a larger and more effective block cache. The data doesn't
get spread around, but you can initiate major compactions on some/all
of the tables which will move data around immediately.  There are no
concerns for growing a cluster in this way - I have done it to double
the size of a cluster and I saw immediate performance.  I major
compacted a table I was doing a map reduce on and I saw more
performance improvements.  In a live serving system you do NOT want to
be accessing disk most the time - caching is the name of the game for
reducing latency.  Everyone does this (you think your google results
are read from disk?) and it's a fairly uniform "law" of doing low
latency services - RAM is king. And when you expand a HBase cluster
you get more effective ram immediately - no rebalancing required
(unlike DHT-based architectures).

What I understood from the above is as follows. I'd appreciate if you could point out if I am wrong.

1. I need to perform major-compaction to unassign regions from the existing loaded region servers to a new region server. I cannot reassign the regions just by doing minor compaction and letting the non-loaded new server perform major compaction later. Having the loaded existing server do heavy major compaction is a concern. 2. "no rebalancing required" means that the blocks of HDFS files for regions need not be moved from one datanode to another.

Thank you.
Maumau

Reply via email to