Hi,
I'm a relative newcomer to both HBase and Hadoop so please bear with me
if some of my queries don't make sense.
I'm managing a small HBase cluster (1 dedicated master, 4 regionservers)
and am currently attempting to take a backup of the data (we can
regenerate the data in our HBase but it will take time). I've tried a
number of different approaches (details below) - I'm wondering if I've
missed an approach or whether the approach I'm using is the best. All
comments welcome.
I'm using HBase 0.19.3 running on top of Hadoop 0.19.1 and our HBase
contains a single table with about 50 million rows.
1. Initially, I came across
http://issues.apache.org/jira/browse/HBASE-897 which seemed like the
ideal way for us to backup our HBase installation while allowing it to
continue running. I ran into a number of problems with this, which I
suspect are due to my HBase cluster being underpowered (I first ran into
OutOfMemory exceptions, after bumping the JVM max heap size on the
client to 512MB - then I saw some java.lang.NullPointerException during
the map phase - I'm not sure if these are due to resource issues on the
HBase cluster or some underlying corruption in HBase).
After adding the following to HBase
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:/home/hadoop/hbase/logs/gc-hbase.log" and setting
<property>
<name>mapred.map.tasks</name>
<value>13</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>5</value>
</property>
in the Hadoop config on the system submitting the backup job, it seemed
to progress further, but ultimately died with various failures including
the following,
java.io.IOException: All datanodes 192.168.1.2:50010 are bad.
Aborting... at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
which again suggests to me that maybe our cluster isn't beefy enough to
run HBase and the M/R job required to do the backup.
2. Given the lack of success with the M/R backup - I figured I'd
shutdown HBase and try a copyToLocal of the entire /hbase tree.
This failed after a few minutes with the following error,
09/08/17 17:53:07 INFO hdfs.DFSClient: No node available for block:
blk_7870832778982080356_55873
file=/hbase/log_192.168.1.3_1240589781392_60020/hlog.dat.1241539463091
09/08/17 17:53:07 INFO hdfs.DFSClient: Could not obtain block
blk_7870832778982080356_55873 from any node: java.io.IOException: No
live nodes contain current block
(and a bunch of other errors - all the same). This suggests to me that
there is some issue with our HBase and that some corruption has occured.
Looking in JIRA, there seem to be a few instances where this can occur
in 0.19.3 / 0.19.1. I tried running HDFS fsck - but it reports the
entire filesystem as healthy. Is there anything I can run to force HBase
to verify it's integrity and drop any rows affected by the above problem?
3. Having failed with the copyToLocal, plan C was to try a -distcp to
another cluster. Initially efforts with -distcp failed with errors about
bad blocks again. I tried running -distcp with the -i option (to ignore
errors) and the copy completed. I've configured up Hbase on the copy
destination to use the copied hbase tree and it seems to start ok. I'm
currently running a count against the copied hbase table to see how
different it is from the original. Does it seem likely that my copy is
corrupt or will Hbase handle the missing blocks gracefully? How do other
people verify the integrity of their HBase? Are there tools like fsck
which can be run at the HBase level?
Any comments on my approach to backups welcome, as I say, I'm far from
the top of this particular learning curve!
thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com