MapR provides this out of the box in a completely Hadoop compatible environment.
Doing this with straight Hadoop involves a fair bit of baling wire. On Tue, Jan 3, 2012 at 1:10 PM, alo alt <wget.n...@googlemail.com> wrote: > Hi Mac, > > hdfs has at the moment no solution for an complete backup- and restore > process like ITL or ISO9000. An strategy could be to "park" the data from > hdfs do you want to backup on tape with "distcp" to another backup cluster > and snapshot from them with SAN mechanism. Here the DN store has to be > located on the SAN box. > > - Alex > > On Tuesday, January 3, 2012, Mac Noland <mcdonaldnol...@yahoo.com> wrote: > > Good day, > > > > I’m guessing this question been asked a myriad of times, but > > we’re about to get serious with some of our Hadoop implementations so I > wanted > > to re-ask to see if I’m missing anything, or if others happen to know if > this might > > be on a future road map. > > > > For our current storage offerings (e.g. NAS or SAN), we give > > businesses the opportunity to choose 7, 14, or 45 day “backups” for their > > storage. The purpose of the backup isn’t > > so much as they are worried about losing their current data (we’re > RAID’ed > > and have some stuff mirrored to remote > > datacenters), but more so if they were to delete some data today, they > can > > recover from yesterday’s backup. Or the > > day before’s backup, or the day before that, etc. And to be honest, > business units buy a good portion of their backups to make people feel > better and fulfill custom contracts. > > > > > > So far with HDFS we haven’t found too many formalized > > offerings for this specific feature. While I haven’t done a ton of > research, the best solution I’ve found is an > > idea where we’d schedule a job to pull the data locally to a mount that > is > > backed up via our traditional methods. See Michael Segel’s first post > on this site > http://lucene.472066.n3.nabble.com/Backing-up-HDFS-td1019184.html > > > > Though we’d have to work through the details of what this > > would look like for our support folks, it looks like something that could > > potentially fit into our current model. We’d basically need to allocate > the same amount of SAN or NAS disk as we > > have for HDFS, then coordinate a snap on the the SAN or NAS via our > traditional > > methods. Not sure what a restore would > > look like, other than we could give the end users read access to the NAS > or SAN > > mounts so they can pick through what they need to recover and let them > figure > > out how to get it back into HDFS. > > > > For use cases like ours where we’d need multi-day backups to > > fulfill business needs, is this kind of what people are thinking or > doing? Moreover, are there any things in the Hadoop > > HDFS road map for providing, for lack of a better word, an “enterprise” > > backup/restore solution? > > > > Thanks in advance, > > > > Mac Noland – Thomson Reuters > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > *P **Think of the environment: please don't print this email unless you > really need to.* > > >