On Tue, Aug 3, 2010 at 11:46 AM, Michael Segel <[email protected]> wrote: > > > >> Date: Tue, 3 Aug 2010 11:02:48 -0400 >> Subject: Re: Backing up HDFS >> From: [email protected] >> To: [email protected] >> > >> Assuming you are taking the distcp approach you can mirror your >> cluster with some scripting/coding. However your destination systems >> can be more modest, assuming you wish to use it ONLY for data no job >> processing: >> > > And that would be a waste. (Why build a cloud just to store data and not do > any processing?) > > You're not building your cloud in a vacuum. There are going to be SAN(s), > other servers, tape??? available. The trick is getting the important data off > the cloud to a place where it can be backed up via the corporation's standard > IT practices. > > Because of the size of data, you may see people pulling data off the cloud in > to a SAN, then to either a tape drive or a SATA Hot Swap Drive for off site > storage. > It all depends on the value of the data. > > Again, YMMV > > HTH > > -Mike > >
> You're not building your cloud in a vacuum. There are going to be SAN(s), > other servers, tape??? available. The trick is getting the >important data > off the cloud to a place where it can be backed up via the corporation's > standard IT practices. Right. it all depends on what you want and your needs. In my example I wanted near line backups for a lot of data that I can recovery quickly, thus a solution distcp to a second cluster. If you want to integrate with other backup software you can do local copying or experiment with fuse hadoop. Mount the drive and backup via traditional methods (I just hope you have a lot of tapes :)
