My current setup in EC2 is a Hadoop Map Reduce cluster and HBase
cluster sharing the same HDFS. That is, I have a batch of nodes that
run datanode and tasktracker and a bunch of nodes that run datanode
and regionserver. I'm trying to move HBase off this cluster to a new
cluster with it's own HDFS.

My plan is to shut down the cluster, copy the HFiles using distcp, and
then start up the new cluster. My problem is that it looks like it
will take several hours to transfer the > 1TB of data. I don't want to
be offline that long. Is it possible to copy the HFiles while the
cluster is up? Do I need to take any special precautions? I think my
plan would be to turn off any jobs writing, take what tables I can
offline, and leave the critical tables online but only serving reads.

Jonathan Gray mentioned he has copied the files with HBase running
successfully in https://issues.apache.org/jira/browse/HBASE-1684

Reply via email to