Assuming that this approach was taken to create "new" regions instead of splitting, since this is assuming monotonically increasing keys you would always have one "hot" region that would be receiving all of the new data. That would limit load throughput.
-----Original Message----- From: Fernando Padilla [mailto:f...@alum.mit.edu] Sent: Wednesday, June 15, 2011 3:28 PM To: user@hbase.apache.org Subject: Re: random versus ordered input strategies I'm sorry to butt in, but I have a question about that "split region in half" growth strategy. If it's too off-topic, you can ignore it.. I know that people always complain about using HBase for monotonously increasing inputs, and recommend to use a random or hashed key strategy to distribute the load across servers. I've always wondered if you can change the "split region in half" strategy to "create new region". If such a strategy could be useful for people that want to have monotonously increasing inputs. It won't be as efficient as full randomly distributed keys, but might be a good enough compromise to keep code complexity down. From your example: If you have an almost full region with keys 1-10. And I go to add key 11. Right not it would split in half: 1-5, 6-11; And have to incur the overhead of creating new regions and doing lots of data copying to create all of the new region data files, etc. And to add further insult once that new region is close to full, we have 1-5, 6-16; once 17 is added, then it will go through a region split once again: 1-5, 6-10, 11-16, leaving half filled regions. What if instead, you had a nearly full region, keys 1-10. Then add key 11. And because it's in a "monotonously increasing inputs" mode, it would leave region 1-10 alone, and simply create a brand new region, 11-11 (hopefully in a different region server). I know that someone must have thought about this already, and must have good reasons to not follow it, but I've never seen this discussed in the architecture docs, nor in the email list.. so just wondering... :) On 6/15/11 10:58 AM, Chris Tarnas wrote: > There are a few ways: > > 1) Dynamically as data added. You start with one region and all data goes > there. When a region grows to big, it gets split in half. So if a region had > keys 1-10 we now have 1-5 and 5-10.