[jira] Commented: (HADOOP-834) Export the HDFS file system through a NFS protocol

Craig Macdonald (JIRA) Tue, 11 Dec 2007 01:50:13 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550377
 ]


Craig Macdonald commented on HADOOP-834:
----------------------------------------

Hello Dhruba,

It's is best I explain what I did to get where I did. Originally when I read 
this issue, I did not note the attachment, hence the duplicated effort to 
implement a NFS-DFS gateway implementation

Firstly, like you, I looked around for an existing Java NFS implementation that 
I liked. I found another, jnfsd 
(http://members.aol.com/_ht_a/markmitche11/jnfsd.htm). This is also written in 
Java, but again restricted to NFSv2.

However, what is interesting about this NFS implementation is that it is based 
on a .x file. These .x files describe the C API and the RPC packets in a .c 
like syntax, and .c files can be built using a Unix tool called rpcgen. 
RemoteTea (remotetea.sourceforge.net) can also compile .x files into .java 
files. Along with a jar file it provides, these java files implement objects 
representing all network connectivity, so that all that much be added is 
implementations of the appropriate methods (ie READDIR, READ, WRITE, CREATE, 
etc) .

jnfsd compiles nfs_prot.x (v2). So instead, I downloaded nf3_prot.x and ran 
that on jprcgen. Since then, I've been slowly adding implmentations of calls, 
but havent  had a chance to test it yet.

I have the following points to make to compare the two:

For remotetea based solution
 * Follows directly a .C description of the RPC protocol, so could up to NFS v4 
in the future
 * Remotetea handles the network API etc

Against the current remotetea solution
 * Remotetea creates lots of objects when performing RPC calls, perhaps too 
many?
 * Stuck within the Remotetea framework
 * (I havent finished it)
 * Memory based caching of handles - can be expensive, eg for du operations [du 
has no RPC call, so requires recursive READDIR ops]

For Dhruba's solution
 * Easier to customise?
 * Disk based caching of NFS handles

Against Dhruba's solution
 * Harder to port to other NFS versions?


--

NFS writing semantics

I picked this up from the  NFS RFC (http://www.faqs.org/rfcs/rfc1813.html)

   The NFS version 3 protocol introduces safe asynchronous writes
   on the server, when the WRITE procedure is used in conjunction
   with the COMMIT procedure. The COMMIT procedure provides a way
   for the client to flush data from previous asynchronous WRITE
   requests on the server to stable storage and to detect whether
   it is necessary to retransmit the data.

With the introduction of your described extended writing API for dfs, it seems 
we are likely making progress towards a suitable writing solution. I would 
indeed suggest a replacement of blocks for random-like writes. My feeling is 
that we likely need to experiment with various NFS clients, particularly linux, 
to determine how they write files for typical operations. People know that the 
DFS is designed for storing large files in streams, so it's probably acceptable 
if a random write essentially requries a 64MB copy, update and replicate (ie 
slow).

Craig

> Export the HDFS file system through a NFS protocol
> --------------------------------------------------
>
>                 Key: HADOOP-834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-834
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: nfshadoop.tar.gz
>
>
> It would be nice if can expose the HDFS filesystem using the NFS protocol.
> There are a couple of options that I could find:
> 1. Use a user space C-language-implementation of a NFS server and then use 
> the libhdfs API to integrate that code with Hadoop. There is such an 
> implementation available at 
> http://sourceforge.net/project/showfiles.php?group_id=66203.
> 2.  Use a user space Java implementation of a NFS server and then integrate 
> it with HDFS using Java API. There is such an implementation of NFS server at 
> http://void.org/~steven/jnfs/.
> I have experimented with Option 2 and have written a first version of the 
> Hadoop integration. I am attaching the code for your preliminary feedback. 
> This implementation of the Java NFS server has one limitation: it supports 
> UDP only. Some licensing issues will have to be sorted out before it can be 
> used.  Steve (the writer of the NFS server implemenation) has told me that he 
> can change the licensing of the code if needed.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-834) Export the HDFS file system through a NFS protocol

Reply via email to