>> So the question is how large to make your regions if you have 100s of TBs? Yeah.. I realize it would depend on the number of nodes and their specs.. My question tilted more towards asking how many regions per node Hbase would be comfortable with (at least as of now) with the default config.. Assume an off-the-shelf machine for simplicity..
>> Same with major compactions (you would definitely need to turn them off and >> control them manually if you need them at all). Huh.. But, that can affect reads and scans.. Further the problem is aggravated if we want to do online writes: A steady stream (whose volume lets say, is roughly an order of magnitude lower than the size of the db) of updates to the database may require compaction be done regularly: Isnt that right? V On 5/17/10 12:26 PM, "Jonathan Gray" <jg...@facebook.com> wrote: So the question is how large to make your regions if you have 100s of TBs? How many nodes will this be on and what are the specs of each node? Many people run with 1-2GB regions or higher. Primarily the issue will be memory usage and also the propensity for splitting. With that dataset size, you'll need to be careful about splitting too much because rewrites of data are expensive. Same with major compactions (you would definitely need to turn them off and control them manually if you need them at all). > -----Original Message----- > From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com] > Sent: Monday, May 17, 2010 12:19 PM > To: hbase-user@hadoop.apache.org > Subject: Re: Additional disk space required for Hbase compactions.. > > >> I'm not sure I understand why you distinguish small HFiles and a > single behemoth HFile? Are you trying to understand more > >> about disk space or I/O patterns? > Was talking wrt an application I had in mind.. Right now, I am > considering just disk space.. > > Ryan's comment: > >> Yes compactions happen on hdfs. Hbase will only compact one region > at a time > >> per regionservers so you in theory will need k?max(all region > sizes). > So the U and M from my mail are sizes per region. Am I right? So what > is a good cutoff region size for hundreds of TB of data to be stored in > hbase? I am wondering if this has ever been attempted.. > > Vidhya > > > > -----Original Message----- > > From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com] > > Sent: Monday, May 17, 2010 11:56 AM > > To: hbase-user@hadoop.apache.org > > Cc: Joel Koshy > > Subject: Additional disk space required for Hbase compactions.. > > > > Hi guys, > > I am quite new to Hbase.. I am trying to figure out the max > > additional disk space required for compactions.. > > > > If the set of small Hfiles amount to a size of U in total, before a > > major compaction happens and the 'behemoth' HFile has size M, > assuming > > the resultant size of the Hfile after compaction is U+M (worst case > has > > only insertions) and a replication factor of r, then disk space taken > > by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is > > based on my understanding that compactions happen on HDFS and not on > > the local file system: am I correct?)... > > > > Thank you > > Vidhya > > >