[ https://issues.apache.org/jira/browse/HADOOP-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550377 ]
Craig Macdonald commented on HADOOP-834: ---------------------------------------- Hello Dhruba, It's is best I explain what I did to get where I did. Originally when I read this issue, I did not note the attachment, hence the duplicated effort to implement a NFS-DFS gateway implementation Firstly, like you, I looked around for an existing Java NFS implementation that I liked. I found another, jnfsd (http://members.aol.com/_ht_a/markmitche11/jnfsd.htm). This is also written in Java, but again restricted to NFSv2. However, what is interesting about this NFS implementation is that it is based on a .x file. These .x files describe the C API and the RPC packets in a .c like syntax, and .c files can be built using a Unix tool called rpcgen. RemoteTea (remotetea.sourceforge.net) can also compile .x files into .java files. Along with a jar file it provides, these java files implement objects representing all network connectivity, so that all that much be added is implementations of the appropriate methods (ie READDIR, READ, WRITE, CREATE, etc) . jnfsd compiles nfs_prot.x (v2). So instead, I downloaded nf3_prot.x and ran that on jprcgen. Since then, I've been slowly adding implmentations of calls, but havent had a chance to test it yet. I have the following points to make to compare the two: For remotetea based solution * Follows directly a .C description of the RPC protocol, so could up to NFS v4 in the future * Remotetea handles the network API etc Against the current remotetea solution * Remotetea creates lots of objects when performing RPC calls, perhaps too many? * Stuck within the Remotetea framework * (I havent finished it) * Memory based caching of handles - can be expensive, eg for du operations [du has no RPC call, so requires recursive READDIR ops] For Dhruba's solution * Easier to customise? * Disk based caching of NFS handles Against Dhruba's solution * Harder to port to other NFS versions? -- NFS writing semantics I picked this up from the NFS RFC (http://www.faqs.org/rfcs/rfc1813.html) The NFS version 3 protocol introduces safe asynchronous writes on the server, when the WRITE procedure is used in conjunction with the COMMIT procedure. The COMMIT procedure provides a way for the client to flush data from previous asynchronous WRITE requests on the server to stable storage and to detect whether it is necessary to retransmit the data. With the introduction of your described extended writing API for dfs, it seems we are likely making progress towards a suitable writing solution. I would indeed suggest a replacement of blocks for random-like writes. My feeling is that we likely need to experiment with various NFS clients, particularly linux, to determine how they write files for typical operations. People know that the DFS is designed for storing large files in streams, so it's probably acceptable if a random write essentially requries a 64MB copy, update and replicate (ie slow). Craig > Export the HDFS file system through a NFS protocol > -------------------------------------------------- > > Key: HADOOP-834 > URL: https://issues.apache.org/jira/browse/HADOOP-834 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: nfshadoop.tar.gz > > > It would be nice if can expose the HDFS filesystem using the NFS protocol. > There are a couple of options that I could find: > 1. Use a user space C-language-implementation of a NFS server and then use > the libhdfs API to integrate that code with Hadoop. There is such an > implementation available at > http://sourceforge.net/project/showfiles.php?group_id=66203. > 2. Use a user space Java implementation of a NFS server and then integrate > it with HDFS using Java API. There is such an implementation of NFS server at > http://void.org/~steven/jnfs/. > I have experimented with Option 2 and have written a first version of the > Hadoop integration. I am attaching the code for your preliminary feedback. > This implementation of the Java NFS server has one limitation: it supports > UDP only. Some licensing issues will have to be sorted out before it can be > used. Steve (the writer of the NFS server implemenation) has told me that he > can change the licensing of the code if needed. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.