You don't want to use DFS on top of NFS. If you use DFS, keep its data
on the local drives, not in NFS. If you want to use NFS for shared
data, then simply don't use DFS: specify "local" as the filesystem and
don't start datanodes or a namenode.
I think you'll find DFS will perform better than NFS for crawling,
indexing, etc. If you like, at the end, you could copy the final index
from DFS onto your NFS server, if that's where you'd prefer to have it.
Does that help?
Doug
Adam Taylor wrote:
Hello, I've started to do some initial test runs with Hadoop 0.4.0, Nutch
0.8 and Nutchwax 0.6+. My setup includes several rack mount servers that
will be used for distributed indexing and a clustered file server that is
NFS mounted on each server. I would like for all of the hadoop slaves to
write the index to the file server (instead of to local disk).
I am curious, if the Hadoop master and its slaves will be accessing the
same
file server to store the index, will it be possible to run the index in
distributed mode but specify "local" for the file system? I have tried
doing
it this way and couldn't get it to work. It seems that all documentation
for
Hadoop suggests using distributed mode for both the file system and the
indexing. However, if I try with a distributed file system with my setup,
each slave is writing to the same file server so we get a conflict: "Cannot
start multiple Datanode instances sharing the same data directory"
Thanks!
Adam