Thank you Jonathan for raising the Jira and attaching a patch

I was looking for more info on how major compactions and minor compactions work and google found me this page:

http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture

After reading the wiki page and Google Bigtable paper, it seems to me that there is a difference between Google 'minor compactions' andHbase 'minor compactions'.

In google, a minor compaction is (from the paper):
"5.4 Compactions
As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS. This minor compaction process has two goals: it shrinks the memory usage of the tablet server, and it reduces the amount of data that has to be read from the commit log during recovery if this server dies. Incoming read and write operations can continue while compactions occur. Every minor compaction creates a new SSTable. If this behavior continued unchecked, read operations might need to merge updates from an arbitrary number of SSTables."

On the other hand the Hbase wiki:
"Compactions: When the number of MapFiles exceeds a configurable threshold, a minor compaction is performed which consolidates the most recently written MapFiles."

So it seems that:
1) google minor compactions are equivalent to Hbase cache flushes
2) google major compactions are equivalent to Hbase major compactions
3) there is no equivalent of Hbase minor compactions in the google design.

can somebody confirm this?
As in my case my data is almost immutable (i.e I do not have a lot of space to claim for deleted rows as there are few of them) , I am wondering if the compactions do not more harm than good.

Thanks
TuX



On 17/05/10 23:12, Jonathan Gray wrote:
No there isn't.

I just opened a JIRA to make it so it can be set to 0 to disable.

https://issues.apache.org/jira/browse/HBASE-2559

Will put up a patch for trunk/0.21.

JG

-----Original Message-----
From: TuX RaceR [mailto:tuxrace...@gmail.com]
Sent: Monday, May 17, 2010 1:47 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Additional disk space required for Hbase compactions..

Hello List,


On 17/05/10 20:26, Jonathan Gray wrote:
   Same with major compactions (you would definitely need to turn them
off and control them manually if you need them at all).

How would you turn the major compaction off?
The only major compaction related parameter is this one:

<property>
<name>hbase.hregion.majorcompaction</name>
<value>86400000</value>
<description>The time (in miliseconds) between 'major' compactions of
all
      HStoreFiles in a region.  Default: 1 day.
</description>
</property>

Is there a cleaner way to turn it off than putting a ridiculously large
value?

Thanks
TuX

Reply via email to