Hi,
These two seem both in a bit of a weird state: HMerge is scoped package local,
therefore no one but the package can call the merge() functions... and no one
does that but the unit test. But it would be good to have this on the CLI and
shell as a command (and in the shell maybe with a confirmation message?), but
it is not available AFAIK.
HMerge can merge regions of tables that are disabled. It also merges all that
qualify, i.e. where the merged region is less than or equal of half the
configured max file size.
Merge on the other hand does have a main(), so can be invoked:
$ hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>
Note how the help insinuates that you can use it as a tool, but that is not
correct. Also, it only merges two given regions, and the cluster must be shut
down (only the HBase daemons). So that is a step back.
What is worse is that I cannot get it to work. I tried in the shell:
hbase(main):001:0> create 'testtable', 'colfam1', {SPLITS =>
['row-10','row-20','row-30','row-40','row-50']}
0 row(s) in 0.2640 seconds
hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put 'testtable',
"row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
0 row(s) in 1.0450 seconds
hbase(main):003:0> flush 'testtable'
0 row(s) in 0.2000 seconds
hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
ROW COLUMN+CELL
testtable,,1309614509037.612d1e0112 column=info:regioninfo, timestamp=130...
406e6c2bb482eeaec57322. STARTKEY => '', ENDKEY => 'row-10'
testtable,row-10,1309614509040.2fba column=info:regioninfo, timestamp=130...
fcc9bc6afac94c465ce5dcabc5d1. STARTKEY => 'row-10', ENDKEY => 'row-20'
testtable,row-20,1309614509041.e7c1 column=info:regioninfo, timestamp=130...
6267eb30e147e5d988c63d40f982. STARTKEY => 'row-20', ENDKEY => 'row-30'
testtable,row-30,1309614509041.a9cd column=info:regioninfo, timestamp=130...
e1cbc7d1a21b1aca2ac7fda30ad8. STARTKEY => 'row-30', ENDKEY => 'row-40'
testtable,row-40,1309614509041.d458 column=info:regioninfo, timestamp=130...
236feae097efcf33477e7acc51d4. STARTKEY => 'row-40', ENDKEY => 'row-50'
testtable,row-50,1309614509041.74a5 column=info:regioninfo, timestamp=130...
7dc7e3e9602d9229b15d4c0357d1. STARTKEY => 'row-50', ENDKEY => ''
6 row(s) in 0.0440 seconds
hbase(main):005:0> exit
$ ./bin/stop-hbase.sh
$ hbase org.apache.hadoop.hbase.util.Merge testtable \
testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.
But I get consistently errors:
11/07/02 07:20:49 INFO util.Merge: Merging regions
testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0. and
testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in table testtable
11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32 MB,
rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
11/07/02 07:20:49 INFO wal.HLog: New hlog
/Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.1309616449181
11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826 not available;
hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@25961581,
exception=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor config
now ...
11/07/02 07:20:49 INFO regionserver.HRegion: Onlined -ROOT-,,0.70236052; next
sequenceid=1
info: null
region1: [B@48fd918a
region2: [B@7f5e2075
11/07/02 07:20:49 FATAL util.Merge: Merge failed
java.io.IOException: Could not find meta region for
testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
at org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:211)
at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor config
now ...
11/07/02 07:20:49 INFO regionserver.HRegion: Onlined .META.,,1.1028785192; next
sequenceid=1
11/07/02 07:20:49 INFO regionserver.HRegion: Closed -ROOT-,,0.70236052
11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
11/07/02 07:20:49 ERROR util.Merge: exiting due to error
java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
at
org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:229)
at
org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:258)
at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
After which I most of the times have shot .META. with an error
2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.master.HMaster: Failed
getting all descriptors
java.io.FileNotFoundException: No status for
hdfs://localhost:8020/hbase/.corrupt
at
org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
at
org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1429)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:312)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1065)
Lars