Yes compactions happen on hdfs. Hbase will only compact one region at a time per regionservers so you in theory will need k×max(all region sizes).
But hdfs does a delayed delete, so deleted files are not instantly freed up. You could end up requiring much more disk space. Considering hdfs disk should be the cheapest (data drives in a low density configuration) disks you own hopefully it wont be hard to over provision. On May 17, 2010 11:57 AM, "Vidhyashankar Venkataraman" < vidhy...@yahoo-inc.com> wrote: Hi guys, I am quite new to Hbase.. I am trying to figure out the max additional disk space required for compactions.. If the set of small Hfiles amount to a size of U in total, before a major compaction happens and the 'behemoth' HFile has size M, assuming the resultant size of the Hfile after compaction is U+M (worst case has only insertions) and a replication factor of r, then disk space taken by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is based on my understanding that compactions happen on HDFS and not on the local file system: am I correct?)... Thank you Vidhya