Kevin, Taking writes during the transition time will be the issue.
If you don't take any writes, then you can flush all your tables and do an HDFS copy the same way. HBase doesn't actually have to be shutdown, that's just recommended to prevent things from changing mid-backup. If you're careful to not write data it should be ok. JG -----Original Message----- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, March 03, 2010 11:40 AM To: hbase-user@hadoop.apache.org Subject: Re: HFile backup while cluster running If you disable writing, you can use org.apache.hadoop.hbase.mapreduce.Export to export all your data, copy them to your new HDFS, then use org.apache.hadoop.hbase.mapreduce.Import, finally switch your clients to the new HBase cluster. On Wed, Mar 3, 2010 at 11:27 AM, Kevin Peterson <kpeter...@biz360.com>wrote: > My current setup in EC2 is a Hadoop Map Reduce cluster and HBase > cluster sharing the same HDFS. That is, I have a batch of nodes that > run datanode and tasktracker and a bunch of nodes that run datanode > and regionserver. I'm trying to move HBase off this cluster to a new > cluster with it's own HDFS. > > My plan is to shut down the cluster, copy the HFiles using distcp, and > then start up the new cluster. My problem is that it looks like it > will take several hours to transfer the > 1TB of data. I don't want to > be offline that long. Is it possible to copy the HFiles while the > cluster is up? Do I need to take any special precautions? I think my > plan would be to turn off any jobs writing, take what tables I can > offline, and leave the critical tables online but only serving reads. > > Jonathan Gray mentioned he has copied the files with HBase running > successfully in https://issues.apache.org/jira/browse/HBASE-1684 >