Re: HBase behaviour at startup (compression)

stack Wed, 17 Dec 2008 14:31:20 -0800

Jean-Adrien wrote:

Hello,


I have a question regarding the behavior of HBase at startup time.
First the region servers load all regions of enabled tables, then a batch
task of (minor?) compression is made on some of these regions:

2008-12-17 11:04:46,688 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region
test-D-0.3,GST13927+129099482919-13927,1229196632010
2008-12-17 11:05:36,196 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
test-D-0.3,GST13927+129099482919-13927,1229196632010 in 49sec

What are the concerned regions ? All of them ? Only the region that have
been modified during the last roll of log ?

All regions on open schedule a compaction (Usually compaction if 'minor'unless the 'major' interval has elapsed).

We added this a while back for the following reason. Region opensusually are the result of a split. Splits are done by creating facadeson the parent regions mapfiles. These facades -- or 'References' inhbase-speak -- reference the parent regions' mapfiles; one facadeserves up the top-half of the parent's mapfiles while the other servesthe bottom-half. This mechanism makes it so splits run fast. Downsideis that while these References are present in a region, the region isnot splittable to avoid build up of compound, fragileReferences-to-References.... relationships. Compactions clean upReferences by writing the content of the parents top or bottom half intonew mapfiles in the daughter regions. During heavy-duty uploading,splits are fast and furious. To keep it so regions are splittable assoon as possible, we were scheduling clean-up of References as fast aspossible by immediately scheduling a compaction.

Missing from the above is special handling of startup. Andrew hasstarted work on this in hbase-1062.

In my case it takes several hours to complete, since I have about 500
regions for 2 region servers. And if I have well  understood how hadoop
works, it yield that the entire hdfs content is rewritten during this phase,
since the file are written once. Isn't it ?

Sounds like original report on HBASE-938 (though the issue got hijackedto address a different issue). Do you think a major compaction is beingtriggered on each startup?


Was this a clean shutdown Jean-Adrien?

As to rewriting all data, it shouldn't be. Before the HBASE-938 fix,we'd rewrite all data if a major compaction but not since its commit.

TRUNK has improvements in this area including logging what type ofcompaction is running, whether major or minor.

If I disable and re-enable a table, must the compactions re-run ?

Since regions are opened on reenable, compaction check will be scheduledbut if nothing to do, the compaction will be a noop.


St.Ack

Re: HBase behaviour at startup (compression)

Reply via email to