So the question is how large to make your regions if you have 100s of TBs?

How many nodes will this be on and what are the specs of each node?

Many people run with 1-2GB regions or higher.

Primarily the issue will be memory usage and also the propensity for splitting. 
 With that dataset size, you'll need to be careful about splitting too much 
because rewrites of data are expensive.  Same with major compactions (you would 
definitely need to turn them off and control them manually if you need them at 
all).

> -----Original Message-----
> From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com]
> Sent: Monday, May 17, 2010 12:19 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Additional disk space required for Hbase compactions..
> 
> >> I'm not sure I understand why you distinguish small HFiles and a
> single behemoth HFile?  Are you trying to understand more
> >> about disk space or I/O patterns?
> Was talking wrt an application I had in mind.. Right now, I am
> considering just disk space..
> 
> Ryan's comment:
> >> Yes compactions happen on hdfs. Hbase will only compact one region
> at a time
> >> per regionservers so you in theory will need k?max(all region
> sizes).
> So the U and M from my mail are sizes per region. Am I right? So what
> is a good cutoff region size for hundreds of TB of data to be stored in
> hbase? I am wondering if this has ever been attempted..
> 
> Vidhya
> 
> 
> > -----Original Message-----
> > From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com]
> > Sent: Monday, May 17, 2010 11:56 AM
> > To: hbase-user@hadoop.apache.org
> > Cc: Joel Koshy
> > Subject: Additional disk space required for Hbase compactions..
> >
> > Hi guys,
> >   I am quite new to Hbase.. I am trying to figure out the max
> > additional disk space required for compactions..
> >
> >   If the set of small Hfiles amount to a size of U in total, before a
> > major compaction happens and the 'behemoth' HFile has size M,
> assuming
> > the resultant size of the Hfile after compaction is U+M (worst case
> has
> > only insertions) and a replication factor of r, then disk space taken
> > by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is
> > based on my understanding that compactions happen on HDFS and not on
> > the local file system: am I correct?)...
> >
> > Thank you
> > Vidhya
> >
> 

Reply via email to