Re: Additional disk space required for Hbase compactions..

TuX RaceR Tue, 18 May 2010 01:07:23 -0700

Thank you Jonathan for raising the Jira and attaching a patch

I was looking for more info on how major compactions and minorcompactions work and google found me this page:


http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture

After reading the wiki page and Google Bigtable paper, it seems to methat there is a difference between Google 'minor compactions' andHbase'minor compactions'.


In google, a minor compaction is (from the paper):
"5.4 Compactions

As write operations execute, the size of the memtable increases. Whenthe memtable size reaches a threshold, the memtable is frozen, a newmemtable is created, and the frozen memtable is converted to an SSTableand written to GFS. This minor compaction process has two goals:it shrinks the memory usage of the tablet server, and it reduces theamount of data that has to be read from the commit log during recoveryif this server dies. Incoming read and write operations can continuewhile compactions occur.Every minor compaction creates a new SSTable. If this behavior continuedunchecked, read operations might need to merge updates from an arbitrarynumber of SSTables."


On the other hand the Hbase wiki:

"Compactions: When the number of MapFiles exceeds a configurablethreshold, a minor compaction is performed which consolidates the mostrecently written MapFiles."


So it seems that:
1) google minor compactions are equivalent to Hbase cache flushes
2) google major compactions are equivalent to Hbase major compactions
3) there is no equivalent of Hbase minor compactions in the google design.

can somebody confirm this?

As in my case my data is almost immutable (i.e I do not have a lot ofspace to claim for deleted rows as there are few of them) , I amwondering if the compactions do not more harm than good.


Thanks
TuX



On 17/05/10 23:12, Jonathan Gray wrote:

No there isn't.

I just opened a JIRA to make it so it can be set to 0 to disable.

https://issues.apache.org/jira/browse/HBASE-2559

Will put up a patch for trunk/0.21.

JG

-----Original Message-----
From: TuX RaceR [mailto:tuxrace...@gmail.com]
Sent: Monday, May 17, 2010 1:47 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Additional disk space required for Hbase compactions..

Hello List,


On 17/05/10 20:26, Jonathan Gray wrote:

   Same with major compactions (you would definitely need to turn them

off and control them manually if you need them at all).

How would you turn the major compaction off?
The only major compaction related parameter is this one:

<property>
<name>hbase.hregion.majorcompaction</name>
<value>86400000</value>
<description>The time (in miliseconds) between 'major' compactions of
all
      HStoreFiles in a region.  Default: 1 day.
</description>
</property>

Is there a cleaner way to turn it off than putting a ridiculously large
value?

Thanks
TuX

Re: Additional disk space required for Hbase compactions..

Reply via email to