On Sat, May 8, 2010 at 5:50 PM, MauMau <maumau...@gmail.com> wrote: > Hi, Ryan > > From: "Ryan Rawson" <ryano...@gmail.com> >> >> Here at Stumbleupon we handle 12,000 requests/second, some >> regionservers are a bit warmer than others, but it hasnt proven to be >> a serious issue. > > Thank you for sharing your precious experience. "12,000 requests/second" is > great! > > >> If you want more head room you add servers. HBase will reassign >> regions to those regionservers. You now have access to more CPU and >> RAM and have a larger and more effective block cache. The data doesn't >> get spread around, but you can initiate major compactions on some/all >> of the tables which will move data around immediately. There are no >> concerns for growing a cluster in this way - I have done it to double >> the size of a cluster and I saw immediate performance. I major >> compacted a table I was doing a map reduce on and I saw more >> performance improvements. In a live serving system you do NOT want to >> be accessing disk most the time - caching is the name of the game for >> reducing latency. Everyone does this (you think your google results >> are read from disk?) and it's a fairly uniform "law" of doing low >> latency services - RAM is king. And when you expand a HBase cluster >> you get more effective ram immediately - no rebalancing required >> (unlike DHT-based architectures). > > What I understood from the above is as follows. I'd appreciate if you could > point out if I am wrong. > > 1. I need to perform major-compaction to unassign regions from the existing > loaded region servers to a new region server.
This is not so - regions are automatically reassigned with no compaction necessary. > I cannot reassign the regions just by doing minor compaction and letting the > non-loaded new server perform major compaction later. Having the loaded > existing server do heavy major compaction is a concern. this is not what happens, regions are reassigned without requiring any compaction of any kind. > 2. "no rebalancing required" means that the blocks of HDFS files for regions > need not be moved from one datanode to another. So when you add nodes to a new cluster, unless you are running the HDFS balancer, data does not migrate. As HBase naturally compacts tables (once a day by default) it will end up rewriting data and causing its migration. You can help accelerate this process by manually kicking off a compaction for a large table if you have added a lot of new machines. -ryan > > Thank you. > Maumau > >