Yossi, No you may have more than 1 file per region punctually. See "Cache Flushes" here : http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion
Changing the file max size to 512MB will decrease memory consumption but will also affect the level of distribution of your data since you will have half the number of regions. Andrew Purtell did something like that with their semi-production cluster. He started by using a maxsize of 64MB to get early distribution then he gradually upped that number to, IIRC, 512MB. Something I forgot in my last post is that the namenode also eats a lot of memory because it holds the whole namespace in cache and there is nothing to do about it. See the Hadoop documentation about that. J-D On Thu, Oct 23, 2008 at 9:28 AM, Yossi Ittach <[EMAIL PROTECTED]> wrote: > J-D, > > I have only one table (Huge) , with only 1 family - which means every > region > has exactly 1 file . Does it mean that I can significantly decrease the > size > of the MapFile indexes in memory? > Also , what do you think will be the impact of increasing the region size > (from 256 to 512, for example) , in this scenario? > > > > Vale et me ama > Yossi > > > On Thu, Oct 23, 2008 at 2:35 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] > >wrote: > > > Yossi, > > > > Yeah they will go up because each datanode keeps their MapFile indexes in > > memory and the regionservers keep a Memcache of max 64MB (configurable, > see > > hbase-default.xml) for each region it owns. > > > > Rule of thumb? Well in hbase-default the maximum a single family can grow > > inside a single region is 256MB so you can estimate the number of regions > > you will have, but it also depends on the number of tables and families. > > For > > example, if you have a single table with 10 equally filled families, you > > should expect around 12 regions. Only one family? 120 regions rough. > > > > So, based on that number of regions, you can extrapolate the memory > needed > > to host your system. Big nodes with 16GB mem will host way more regions > > then > > a EC2 small instance. > > > > J-D > > > > On Thu, Oct 23, 2008 at 8:22 AM, Yossi Ittach <[EMAIL PROTECTED]> wrote: > > > > > Thanks for the quick reply. > > > > > > I'm following the jvm Memory consumption (using "top") , and what > bothers > > > me > > > is that it seems the percentages are just going up and up , and it > makes > > me > > > kind of worried. > > > > > > I'm trying to load the system with 30GB of data (this is a benchmark) . > I > > > estimate that my production environment will require at least 3 times > > that > > > size. Is there a rule-of-thumb as to how many region servers I'll > need? > > > > > > > > > Vale et me ama > > > Yossi > > > > > > > > > On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Yossi, > > > > > > > > The META region is usually heavily used and it's worst when you use > the > > > web > > > > UI. Just for the lolz, go on the Master's page (the main page) and > hit > > > > "refresh" a couple of times; you should see that number go high up. > > > > > > > > And on how to avoid it, well the only way to split that load would be > > to > > > > have the META region do a split but it will require a lot of data > hence > > a > > > > lot of user regions which I doubt you have on 2 machines. > > > > > > > > J-D > > > > > > > > On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > Hi > > > > > > > > > > Using Hbase with 2 Region servers on similar machines , I see that > > one > > > > > machine is serving almost 400 requests per second , while the other > > one > > > > is > > > > > serving 0-10 . This cause extreme overload on the first machine. > Any > > > idea > > > > > what causes it , or how it can be avoided? > > > > > > > > > > Thanks! > > > > > > > > > > Vale et me ama > > > > > Yossi > > > > > > > > > > > > > > >