Thank you Jonathan for raising the Jira and attaching a patch
I was looking for more info on how major compactions and minor
compactions work and google found me this page:
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
After reading the wiki page and Google Bigtable paper, it seems to me
that there is a difference between Google 'minor compactions' andHbase
'minor compactions'.
In google, a minor compaction is (from the paper):
"5.4 Compactions
As write operations execute, the size of the memtable increases. When
the memtable size reaches a threshold, the memtable is frozen, a new
memtable is created, and the frozen memtable is converted to an SSTable
and written to GFS. This minor compaction process has two goals:
it shrinks the memory usage of the tablet server, and it reduces the
amount of data that has to be read from the commit log during recovery
if this server dies. Incoming read and write operations can continue
while compactions occur.
Every minor compaction creates a new SSTable. If this behavior continued
unchecked, read operations might need to merge updates from an arbitrary
number of SSTables."
On the other hand the Hbase wiki:
"Compactions: When the number of MapFiles exceeds a configurable
threshold, a minor compaction is performed which consolidates the most
recently written MapFiles."
So it seems that:
1) google minor compactions are equivalent to Hbase cache flushes
2) google major compactions are equivalent to Hbase major compactions
3) there is no equivalent of Hbase minor compactions in the google design.
can somebody confirm this?
As in my case my data is almost immutable (i.e I do not have a lot of
space to claim for deleted rows as there are few of them) , I am
wondering if the compactions do not more harm than good.
Thanks
TuX
On 17/05/10 23:12, Jonathan Gray wrote:
No there isn't.
I just opened a JIRA to make it so it can be set to 0 to disable.
https://issues.apache.org/jira/browse/HBASE-2559
Will put up a patch for trunk/0.21.
JG
-----Original Message-----
From: TuX RaceR [mailto:tuxrace...@gmail.com]
Sent: Monday, May 17, 2010 1:47 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Additional disk space required for Hbase compactions..
Hello List,
On 17/05/10 20:26, Jonathan Gray wrote:
Same with major compactions (you would definitely need to turn them
off and control them manually if you need them at all).
How would you turn the major compaction off?
The only major compaction related parameter is this one:
<property>
<name>hbase.hregion.majorcompaction</name>
<value>86400000</value>
<description>The time (in miliseconds) between 'major' compactions of
all
HStoreFiles in a region. Default: 1 day.
</description>
</property>
Is there a cleaner way to turn it off than putting a ridiculously large
value?
Thanks
TuX