hdfs doesn't allow random overwrites or appends. so even if hdfs were mountable 
- i am guessing we couldn't just do a rsync to a dfs mount (never looked at 
rsync code - but assuming it does appends/random-writes). any emulation of 
rsync would end up having to delete and recreate changed files in hdfs.
 
If your data/processing is mostly on log files - replication to hdfs can take 
advantage of some strong assumptions (file only changes at end, can convert one 
file to multiple files as long as mapping can be inferred easily).
 
________________________________

From: Greg Connor [mailto:[EMAIL PROTECTED]
Sent: Wed 1/2/2008 7:03 AM
To: 'hadoop-user@lucene.apache.org'
Subject: Is there an rsyncd for HDFS



Hello,

Does anyone know of a modified "rsync" that gets/puts files to/from the dfs 
instead of the normal, mounted filesystems?  I'm guessing since the dfs can't 
be mounted like a "normal" filesystem that rsync would need to be modified in 
order to access it, as with any other program.  We use rsync --daemon a lot for 
moving files around, making backups, etc. so I think it should be a logical 
fit... at least I hope so.

I'm new to hadoop and just got my first standalone node configured.  Apologies 
if this has been answered before, or if I'm missing something obvious.

Thanks
gregc



Reply via email to