Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "MountableHDFS" page has been changed by Remis: https://wiki.apache.org/hadoop/MountableHDFS?action=diff&rev1=18&rev2=19 Comment: Add some more FUSE implementations - These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc. + These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc. All, except HDFS NFS Proxy, are based on the Filesystem in Userspace project FUSE ([[http://fuse.sourceforge.net/]]). Although the Webdav-based one can be used with other webdav tools, but requires FUSE to actually mount. @@ -28, +28 @@ * webdav - hdfs exposed as a webdav resource * mapR - contains a closed source hdfs compatible file system that supports read/write NFS access * [[https://github.com/cloudera/hdfs-nfs-proxy|HDFS NFS Proxy]] - exports HDFS as NFS without use of fuse. Supports Kerberos and re-orders writes so they are written to hdfs sequentially. + * hadoofus - a FUSE implementation in C for hadoop 0.20.203 to 1.0.3 + * native-hdfs-fuse - a FUSE implementation in C that supports random writes == Supported Operating Systems == @@ -38, +40 @@ Supports reads, writes, and directory operations (e.g., cp, ls, more, cat, find, less, rm, mkdir, mv, rmdir). Things like touch, chmod, chown, and permissions are in the works. Fuse-dfs currently shows all files as owned by nobody. - == Contributing == + === Contributing === It's pretty straightforward to add functionality to fuse-dfs as fuse makes things relatively simple. Some other tasks require also augmenting libhdfs to expose more hdfs functionality to C. See [[http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310240&sorter/order=DESC&sorter/field=priority&resolution=-1&component=12312376|contrib/fuse-dfs JIRAs]] - == Requirements == + === Requirements === * Hadoop with compiled libhdfs.so * Linux kernel > 2.6.9 with fuse, which is the default or Fuse 2.7.x, 2.8.x installed. See: [[http://fuse.sourceforge.net/]] or even easier: [[http://dag.wieers.com/rpm/packages/fuse/]] @@ -63, +65 @@ (probably the same for others too). See [[https://issues.apache.org/jira/browse/HADOOP-3344|HADOOP-3344]] Common build problems include not finding the libjvm.so in JAVA_HOME/jre/lib/OS_ARCH/server or not finding fuse in FUSE_HOME or /usr/local. - === CONFIGURING === @@ -104, +105 @@ -oprotected=%s (a colon separated list of directories that fuse-dfs should not allow to be deleted or moved - e.g., /user:/tmp) -oprivate (not often used but means only the person who does the mount can use the filesystem - aka ! allow_others in fuse speak) -ordbuffer=%d (in KBs how large a buffer should fuse-dfs use when doing hdfs reads) - ro + ro rw -ousetrash (should fuse dfs throw things in /Trash when deleting them) -onotrash (opposite of usetrash) @@ -140, +141 @@ === KNOWN ISSUES === - 1. if you alias `ls` to `ls --color=auto` and try listing a directory with lots (over thousands) of files, expect it to be slow and at 10s of thousands, expect it to be very very slow. This is because `--color=auto` causes ls to stat every file in the directory. Since fuse-dfs does not cache attribute entries when doing a readdir, + 1. if you alias `ls` to `ls --color=auto` and try listing a directory with lots (over thousands) of files, expect it to be slow and at 10s of thousands, expect it to be very very slow. This is because `--color=auto` causes ls to stat every file in the directory. Since fuse-dfs does not cache attribute entries when doing a readdir, this is very slow. see [[https://issues.apache.org/jira/browse/HADOOP-3797|HADOOP-3797]] 2. Writes are approximately 33% slower than the DFSClient. TBD how to optimize this. see: [[https://issues.apache.org/jira/browse/HADOOP-3805|HADOOP-3805]] - try using -obig_writes if on a >2.6.26 kernel, should perform much better since bigger writes implies less context switching. - 3. Reads are ~20-30% slower even with the read buffering. + 3. Reads are ~20-30% slower even with the read buffering. == Fuse-j-HDFS == @@ -183, +184 @@ https://github.com/brockn/hdfs-nfs-proxy + == Hadoofus == + + https://github.com/cemeyer/hadoofus + + > The hadoofus project is an HDFS (Hadoop Distributed File System) client library. It is implemented in C and supports RPC pipelining and out-of-order execution. + > It provides a C API for directly calling Namenode RPCs and performing Datanode block read and write operations, as well as a libhdfs-compatible interface (libhdfs_hadoofus.so). + > It also includes a Python wrapper module, implemented in Cython. + > Note: This library currently supports the HDFS protocol as spoken by Apache Hadoop releases 0.20.203 through 1.0.3. + + == native-hdfs-fuse == + + https://github.com/remis-thoughts/native-hdfs-fuse + + > Unlike most other FUSE HDFS implementations this implementation doesn't use libhdfs or otherwise start a JVM - it constructs and sends the protocol buffer messages itself. + > The implementation supports random file writes too. +
