Hi, Ryan From: "Ryan Rawson" <ryano...@gmail.com>
Here at Stumbleupon we handle 12,000 requests/second, some regionservers are a bit warmer than others, but it hasnt proven to be a serious issue.
Thank you for sharing your precious experience. "12,000 requests/second" is great!
If you want more head room you add servers. HBase will reassign regions to those regionservers. You now have access to more CPU and RAM and have a larger and more effective block cache. The data doesn't get spread around, but you can initiate major compactions on some/all of the tables which will move data around immediately. There are no concerns for growing a cluster in this way - I have done it to double the size of a cluster and I saw immediate performance. I major compacted a table I was doing a map reduce on and I saw more performance improvements. In a live serving system you do NOT want to be accessing disk most the time - caching is the name of the game for reducing latency. Everyone does this (you think your google results are read from disk?) and it's a fairly uniform "law" of doing low latency services - RAM is king. And when you expand a HBase cluster you get more effective ram immediately - no rebalancing required (unlike DHT-based architectures).
What I understood from the above is as follows. I'd appreciate if you could point out if I am wrong.
1. I need to perform major-compaction to unassign regions from the existing loaded region servers to a new region server. I cannot reassign the regions just by doing minor compaction and letting the non-loaded new server perform major compaction later. Having the loaded existing server do heavy major compaction is a concern. 2. "no rebalancing required" means that the blocks of HDFS files for regions need not be moved from one datanode to another.
Thank you. Maumau