Re: Using Hadoop with NFS mounted file server

Doug Cutting Mon, 14 Aug 2006 13:29:23 -0700

You don't want to use DFS on top of NFS. If you use DFS, keep its dataon the local drives, not in NFS. If you want to use NFS for shareddata, then simply don't use DFS: specify "local" as the filesystem anddon't start datanodes or a namenode.

I think you'll find DFS will perform better than NFS for crawling,indexing, etc. If you like, at the end, you could copy the final indexfrom DFS onto your NFS server, if that's where you'd prefer to have it.


Does that help?

Doug

Adam Taylor wrote:

Hello, I've started to do some initial test runs with Hadoop 0.4.0, Nutch
0.8 and Nutchwax 0.6+.   My setup includes several rack mount servers that
will be used for distributed indexing and a clustered file server that is
NFS mounted on each server.  I would like for all of the hadoop slaves to
write the index to the file server (instead of to local disk).

I am curious, if the Hadoop master and its slaves will be accessing thesame

file server to store the index, will it be possible to run the index in

distributed mode but specify "local" for the file system? I have trieddoingit this way and couldn't get it to work. It seems that all documentationfor

Hadoop suggests using distributed mode for both the file system and the
indexing. However, if I try with a distributed file system with my setup,
each slave is writing to the same file server so we get a conflict: "Cannot
start multiple Datanode instances sharing the same data directory"

Thanks!
Adam

Re: Using Hadoop with NFS mounted file server

Reply via email to