Hi,

I'm a relative newcomer to both HBase and Hadoop so please bear with me if some of my queries don't make sense.

I'm managing a small HBase cluster (1 dedicated master, 4 regionservers) and am currently attempting to take a backup of the data (we can regenerate the data in our HBase but it will take time). I've tried a number of different approaches (details below) - I'm wondering if I've missed an approach or whether the approach I'm using is the best. All comments welcome.

I'm using HBase 0.19.3 running on top of Hadoop 0.19.1 and our HBase contains a single table with about 50 million rows.

1. Initially, I came across http://issues.apache.org/jira/browse/HBASE-897 which seemed like the ideal way for us to backup our HBase installation while allowing it to continue running. I ran into a number of problems with this, which I suspect are due to my HBase cluster being underpowered (I first ran into OutOfMemory exceptions, after bumping the JVM max heap size on the client to 512MB - then I saw some java.lang.NullPointerException during the map phase - I'm not sure if these are due to resource issues on the HBase cluster or some underlying corruption in HBase).

After adding the following to HBase

export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/hbase/logs/gc-hbase.log" and setting

<property>
    <name>mapred.map.tasks</name>
    <value>13</value>
  </property>

  <property>
    <name>mapred.reduce.tasks</name>
    <value>5</value>
  </property>

in the Hadoop config on the system submitting the backup job, it seemed to progress further, but ultimately died with various failures including the following,

java.io.IOException: All datanodes 192.168.1.2:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)

which again suggests to me that maybe our cluster isn't beefy enough to run HBase and the M/R job required to do the backup.

2. Given the lack of success with the M/R backup - I figured I'd shutdown HBase and try a copyToLocal of the entire /hbase tree.

This failed after a few minutes with the following error,

09/08/17 17:53:07 INFO hdfs.DFSClient: No node available for block:
blk_7870832778982080356_55873
file=/hbase/log_192.168.1.3_1240589781392_60020/hlog.dat.1241539463091
09/08/17 17:53:07 INFO hdfs.DFSClient: Could not obtain block
blk_7870832778982080356_55873 from any node: java.io.IOException: No
live nodes contain current block

(and a bunch of other errors - all the same). This suggests to me that there is some issue with our HBase and that some corruption has occured. Looking in JIRA, there seem to be a few instances where this can occur in 0.19.3 / 0.19.1. I tried running HDFS fsck - but it reports the entire filesystem as healthy. Is there anything I can run to force HBase to verify it's integrity and drop any rows affected by the above problem?

3. Having failed with the copyToLocal, plan C was to try a -distcp to another cluster. Initially efforts with -distcp failed with errors about bad blocks again. I tried running -distcp with the -i option (to ignore errors) and the copy completed. I've configured up Hbase on the copy destination to use the copied hbase tree and it seems to start ok. I'm currently running a count against the copied hbase table to see how different it is from the original. Does it seem likely that my copy is corrupt or will Hbase handle the missing blocks gracefully? How do other people verify the integrity of their HBase? Are there tools like fsck which can be run at the HBase level?

Any comments on my approach to backups welcome, as I say, I'm far from the top of this particular learning curve!

thanks,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Reply via email to