[
https://issues.apache.org/jira/browse/HADOOP-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550377
]
Craig Macdonald commented on HADOOP-834:
----------------------------------------
Hello Dhruba,
It's is best I explain what I did to get where I did. Originally when I read
this issue, I did not note the attachment, hence the duplicated effort to
implement a NFS-DFS gateway implementation
Firstly, like you, I looked around for an existing Java NFS implementation that
I liked. I found another, jnfsd
(http://members.aol.com/_ht_a/markmitche11/jnfsd.htm). This is also written in
Java, but again restricted to NFSv2.
However, what is interesting about this NFS implementation is that it is based
on a .x file. These .x files describe the C API and the RPC packets in a .c
like syntax, and .c files can be built using a Unix tool called rpcgen.
RemoteTea (remotetea.sourceforge.net) can also compile .x files into .java
files. Along with a jar file it provides, these java files implement objects
representing all network connectivity, so that all that much be added is
implementations of the appropriate methods (ie READDIR, READ, WRITE, CREATE,
etc) .
jnfsd compiles nfs_prot.x (v2). So instead, I downloaded nf3_prot.x and ran
that on jprcgen. Since then, I've been slowly adding implmentations of calls,
but havent had a chance to test it yet.
I have the following points to make to compare the two:
For remotetea based solution
* Follows directly a .C description of the RPC protocol, so could up to NFS v4
in the future
* Remotetea handles the network API etc
Against the current remotetea solution
* Remotetea creates lots of objects when performing RPC calls, perhaps too
many?
* Stuck within the Remotetea framework
* (I havent finished it)
* Memory based caching of handles - can be expensive, eg for du operations [du
has no RPC call, so requires recursive READDIR ops]
For Dhruba's solution
* Easier to customise?
* Disk based caching of NFS handles
Against Dhruba's solution
* Harder to port to other NFS versions?
--
NFS writing semantics
I picked this up from the NFS RFC (http://www.faqs.org/rfcs/rfc1813.html)
The NFS version 3 protocol introduces safe asynchronous writes
on the server, when the WRITE procedure is used in conjunction
with the COMMIT procedure. The COMMIT procedure provides a way
for the client to flush data from previous asynchronous WRITE
requests on the server to stable storage and to detect whether
it is necessary to retransmit the data.
With the introduction of your described extended writing API for dfs, it seems
we are likely making progress towards a suitable writing solution. I would
indeed suggest a replacement of blocks for random-like writes. My feeling is
that we likely need to experiment with various NFS clients, particularly linux,
to determine how they write files for typical operations. People know that the
DFS is designed for storing large files in streams, so it's probably acceptable
if a random write essentially requries a 64MB copy, update and replicate (ie
slow).
Craig
> Export the HDFS file system through a NFS protocol
> --------------------------------------------------
>
> Key: HADOOP-834
> URL: https://issues.apache.org/jira/browse/HADOOP-834
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: nfshadoop.tar.gz
>
>
> It would be nice if can expose the HDFS filesystem using the NFS protocol.
> There are a couple of options that I could find:
> 1. Use a user space C-language-implementation of a NFS server and then use
> the libhdfs API to integrate that code with Hadoop. There is such an
> implementation available at
> http://sourceforge.net/project/showfiles.php?group_id=66203.
> 2. Use a user space Java implementation of a NFS server and then integrate
> it with HDFS using Java API. There is such an implementation of NFS server at
> http://void.org/~steven/jnfs/.
> I have experimented with Option 2 and have written a first version of the
> Hadoop integration. I am attaching the code for your preliminary feedback.
> This implementation of the Java NFS server has one limitation: it supports
> UDP only. Some licensing issues will have to be sorted out before it can be
> used. Steve (the writer of the NFS server implemenation) has told me that he
> can change the licensing of the code if needed.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.