Hi Colin, Yeah, I should add the reasons to the README. We tried LocalFileSystem when we started out but we think we can do tighter Hadoop integration if we write a connector.
Some examples include: 1. Limit over-prefetching of data - MapReduce splits the jobs into 128MB splits and standard NFS driver tends to over-prefetch from a file. We limit the prefetching to the split size. 2. Lazy write commits - For writes, we can relax the guarantees for writes (and making it faster) and commit just before when the task ends. 3. Provide for location awareness - Later, we can hook some NFS smarts into getFileBlockLocations() (Have some ideas but not implemented them yet). Hope this helps. Gokul On Wed, Jan 14, 2015 at 10:47 AM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote: > Why not just use LocalFileSystem with an NFS mount (or several)? I read > through the README but I didn't see that question answered anywhere. > > best, > Colin > > On Tue, Jan 13, 2015 at 1:35 PM, Gokul Soundararajan < > gokulsoun...@gmail.com > > wrote: > > > Hi, > > > > We (Jingxin Feng, Xing Lin, and I) have been working on providing a > > FileSystem implementation that allows Hadoop to utilize a NFSv3 storage > > server as a filesystem. It leverages code from hadoop-nfs project for all > > the request/response handling. We would like your help to add it as part > of > > hadoop tools (similar to the way hadoop-aws and hadoop-azure). > > > > In more detail, the Hadoop NFS Connector allows Apache Hadoop (2.2+) and > > Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint. > > The NFS Connector can be run in two modes: (1) secondary filesystem - > where > > Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a > > second storage endpoint, and (2) primary filesystem - where Hadoop/Spark > > runs entirely on a NFSv3 storage server. > > > > The code is written in a way such that existing applications do not have > to > > change. All one has to do is to copy the connector jar into the lib/ > > directory of Hadoop/Spark. Then, modify core-site.xml to provide the > > necessary details. > > > > The current version can be seen at: > > https://github.com/NetApp/NetApp-Hadoop-NFS-Connector > > > > It is my first time contributing to the Hadoop codebase. It would be > great > > if someone on the Hadoop team can guide us through this process. I'm > > willing to make the necessary changes to integrate the code. What are the > > next steps? Should I create a JIRA entry? > > > > Thanks, > > > > Gokul > > >