On Mon, May 17, 2010 at 3:03 PM, Jonathan Gray <jg...@facebook.com> wrote:

> I'm not sure I understand why you distinguish small HFiles and a single
> behemoth HFile?  Are you trying to understand more about disk space or I/O
> patterns?
>
> It looks like your understanding is correct.  At the worst point, a given
> Region will use twice it's disk space during a major compaction.  Once the
> compaction is complete, the original files are deleted from HDFS.  So it is
> not the case that your entire dataset will require double the space for
> compactions as they are not all run concurrently.
>
> JG
>
> > -----Original Message-----
> > From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com]
> > Sent: Monday, May 17, 2010 11:56 AM
> > To: hbase-user@hadoop.apache.org
> > Cc: Joel Koshy
> > Subject: Additional disk space required for Hbase compactions..
> >
> > Hi guys,
> >   I am quite new to Hbase.. I am trying to figure out the max
> > additional disk space required for compactions..
> >
> >   If the set of small Hfiles amount to a size of U in total, before a
> > major compaction happens and the 'behemoth' HFile has size M, assuming
> > the resultant size of the Hfile after compaction is U+M (worst case has
> > only insertions) and a replication factor of r, then disk space taken
> > by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is
> > based on my understanding that compactions happen on HDFS and not on
> > the local file system: am I correct?)...
> >
> > Thank you
> > Vidhya
> >
>
>

I do not have the answer, but let me warn the major compaction is VERY VERY
IO intensive. In our case, it put a system that would typically respond to
get request in 1-2ms up to 30 or 2000ms(at times). That was not acceptable
for us. I think in most normal cases it is not good to force the issue with
the major compaction.

Reply via email to