Hi, These two seem both in a bit of a weird state: HMerge is scoped package local, therefore no one but the package can call the merge() functions... and no one does that but the unit test. But it would be good to have this on the CLI and shell as a command (and in the shell maybe with a confirmation message?), but it is not available AFAIK.
HMerge can merge regions of tables that are disabled. It also merges all that qualify, i.e. where the merged region is less than or equal of half the configured max file size. Merge on the other hand does have a main(), so can be invoked: $ hbase org.apache.hadoop.hbase.util.Merge Usage: bin/hbase merge <table-name> <region-1> <region-2> Note how the help insinuates that you can use it as a tool, but that is not correct. Also, it only merges two given regions, and the cluster must be shut down (only the HBase daemons). So that is a step back. What is worse is that I cannot get it to work. I tried in the shell: hbase(main):001:0> create 'testtable', 'colfam1', {SPLITS => ['row-10','row-20','row-30','row-40','row-50']} 0 row(s) in 0.2640 seconds hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end 0 row(s) in 1.0450 seconds hbase(main):003:0> flush 'testtable' 0 row(s) in 0.2000 seconds hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']} ROW COLUMN+CELL testtable,,1309614509037.612d1e0112 column=info:regioninfo, timestamp=130... 406e6c2bb482eeaec57322. STARTKEY => '', ENDKEY => 'row-10' testtable,row-10,1309614509040.2fba column=info:regioninfo, timestamp=130... fcc9bc6afac94c465ce5dcabc5d1. STARTKEY => 'row-10', ENDKEY => 'row-20' testtable,row-20,1309614509041.e7c1 column=info:regioninfo, timestamp=130... 6267eb30e147e5d988c63d40f982. STARTKEY => 'row-20', ENDKEY => 'row-30' testtable,row-30,1309614509041.a9cd column=info:regioninfo, timestamp=130... e1cbc7d1a21b1aca2ac7fda30ad8. STARTKEY => 'row-30', ENDKEY => 'row-40' testtable,row-40,1309614509041.d458 column=info:regioninfo, timestamp=130... 236feae097efcf33477e7acc51d4. STARTKEY => 'row-40', ENDKEY => 'row-50' testtable,row-50,1309614509041.74a5 column=info:regioninfo, timestamp=130... 7dc7e3e9602d9229b15d4c0357d1. STARTKEY => 'row-50', ENDKEY => '' 6 row(s) in 0.0440 seconds hbase(main):005:0> exit $ ./bin/stop-hbase.sh $ hbase org.apache.hadoop.hbase.util.Merge testtable \ testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \ testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8. But I get consistently errors: 11/07/02 07:20:49 INFO util.Merge: Merging regions testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0. and testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in table testtable 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32 MB, rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms 11/07/02 07:20:49 INFO wal.HLog: New hlog /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.1309616449181 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826 not available; hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@25961581, exception=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas() 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=1 info: null region1: [B@48fd918a region2: [B@7f5e2075 11/07/02 07:20:49 FATAL util.Merge: Merge failed java.io.IOException: Could not find meta region for testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0. at org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:211) at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386) 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined .META.,,1.1028785192; next sequenceid=1 11/07/02 07:20:49 INFO regionserver.HRegion: Closed -ROOT-,,0.70236052 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting 11/07/02 07:20:49 ERROR util.Merge: exiting due to error java.lang.NullPointerException at org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119) at org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:229) at org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:258) at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386) After which I most of the times have shot .META. with an error 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://localhost:8020/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1429) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:312) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1065) Lars