There was some chatter on the Hbase list about a dual hdfs/s3 driver class which would write to both but only read from hdfs. Of course, having this functionality at the hadoop level would be better than in a subsidiary project.
Maybe the ability to specify a secondary filesystem in the hadoop-site.xml? Candidates might include S3, NFS, or of course, another HDFS in a geographically isolated location. -- Jim R. Wilson (jimbojw) On Fri, May 16, 2008 at 12:06 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > Why not go to the next step and use a second cluster as the backup? > > > On 5/16/08 6:33 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: > >> >> Hi, >> >> what are the options to keep a copy of data from an HDFS instance in >> sync with a backup file system which is not HDFS? Are there Rsync-like >> tools that allow only to transfer deltas or would one have to implement >> that oneself (e.g. by writing a java program that accesses both >> filesystems)? >> >> Thanks in advance, >> >> Robert >> >> P.S.: Why would one want that? E.g. to have a completely redundant copy >> which in case of systematic failure (e.g. data corruption due to a bug) >> offers a backup not affected by that problem. > >
