hdfs doesn't allow random overwrites or appends. so even if hdfs were mountable - i am guessing we couldn't just do a rsync to a dfs mount (never looked at rsync code - but assuming it does appends/random-writes). any emulation of rsync would end up having to delete and recreate changed files in hdfs. If your data/processing is mostly on log files - replication to hdfs can take advantage of some strong assumptions (file only changes at end, can convert one file to multiple files as long as mapping can be inferred easily). ________________________________
From: Greg Connor [mailto:[EMAIL PROTECTED] Sent: Wed 1/2/2008 7:03 AM To: 'hadoop-user@lucene.apache.org' Subject: Is there an rsyncd for HDFS Hello, Does anyone know of a modified "rsync" that gets/puts files to/from the dfs instead of the normal, mounted filesystems? I'm guessing since the dfs can't be mounted like a "normal" filesystem that rsync would need to be modified in order to access it, as with any other program. We use rsync --daemon a lot for moving files around, making backups, etc. so I think it should be a logical fit... at least I hope so. I'm new to hadoop and just got my first standalone node configured. Apologies if this has been answered before, or if I'm missing something obvious. Thanks gregc