Yes compactions happen on hdfs. Hbase will only compact one region at a time
per regionservers so you in theory will need k×max(all region sizes).

But hdfs does a delayed delete, so deleted files are not instantly freed up.
You could end up requiring much more disk space.

Considering hdfs disk should be the cheapest (data drives in a low density
configuration) disks you own hopefully it wont be hard to over provision.

On May 17, 2010 11:57 AM, "Vidhyashankar Venkataraman" <
vidhy...@yahoo-inc.com> wrote:

Hi guys,
 I am quite new to Hbase.. I am trying to figure out the max additional disk
space required for compactions..

 If the set of small Hfiles amount to a size of U in total, before a major
compaction happens and the 'behemoth' HFile has size M, assuming the
resultant size of the Hfile after compaction is U+M (worst case has only
insertions) and a replication factor of r, then disk space taken by the
Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is based on my
understanding that compactions happen on HDFS and not on the local file
system: am I correct?)...

Thank you
Vidhya

Reply via email to