Jean-Adrien wrote:
Hello,

I have a question regarding the behavior of HBase at startup time.
First the region servers load all regions of enabled tables, then a batch
task of (minor?) compression is made on some of these regions:

2008-12-17 11:04:46,688 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region
test-D-0.3,GST13927+129099482919-13927,1229196632010
2008-12-17 11:05:36,196 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
test-D-0.3,GST13927+129099482919-13927,1229196632010 in 49sec

What are the concerned regions ? All of them ? Only the region that have
been modified during the last roll of log ?
All regions on open schedule a compaction (Usually compaction if 'minor' unless the 'major' interval has elapsed).

We added this a while back for the following reason. Region opens usually are the result of a split. Splits are done by creating facades on the parent regions mapfiles. These facades -- or 'References' in hbase-speak -- reference the parent regions' mapfiles; one facade serves up the top-half of the parent's mapfiles while the other serves the bottom-half. This mechanism makes it so splits run fast. Downside is that while these References are present in a region, the region is not splittable to avoid build up of compound, fragile References-to-References.... relationships. Compactions clean up References by writing the content of the parents top or bottom half into new mapfiles in the daughter regions. During heavy-duty uploading, splits are fast and furious. To keep it so regions are splittable as soon as possible, we were scheduling clean-up of References as fast as possible by immediately scheduling a compaction.

Missing from the above is special handling of startup. Andrew has started work on this in hbase-1062.



In my case it takes several hours to complete, since I have about 500
regions for 2 region servers. And if I have well  understood how hadoop
works, it yield that the entire hdfs content is rewritten during this phase,
since the file are written once. Isn't it ?

Sounds like original report on HBASE-938 (though the issue got hijacked to address a different issue). Do you think a major compaction is being triggered on each startup?

Was this a clean shutdown Jean-Adrien?

As to rewriting all data, it shouldn't be. Before the HBASE-938 fix, we'd rewrite all data if a major compaction but not since its commit.

TRUNK has improvements in this area including logging what type of compaction is running, whether major or minor.


If I disable and re-enable a table, must the compactions re-run ?
Since regions are opened on reenable, compaction check will be scheduled but if nothing to do, the compaction will be a noop.

St.Ack

Reply via email to