[Hadoop Wiki] Update of "MountableHDFS" by Remis

Apache Wiki Fri, 02 May 2014 07:59:50 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "MountableHDFS" page has been changed by Remis:
https://wiki.apache.org/hadoop/MountableHDFS?action=diff&rev1=18&rev2=19

Comment:
Add some more FUSE implementations

  
  
  
- These projects (enumerated below) allow HDFS to be mounted (on most flavors 
of Unix) as a standard file system using the mount command.  Once mounted, the 
user can operate on an instance of hdfs using standard Unix utilities such as 
'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like 
open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc. 
+ These projects (enumerated below) allow HDFS to be mounted (on most flavors 
of Unix) as a standard file system using the mount command.  Once mounted, the 
user can operate on an instance of hdfs using standard Unix utilities such as 
'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like 
open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc.
  
  All, except HDFS NFS Proxy, are based on the Filesystem in Userspace project 
FUSE ([[http://fuse.sourceforge.net/]]). Although the Webdav-based one can be 
used with other webdav tools, but requires FUSE to actually mount.
  
@@ -28, +28 @@

   * webdav - hdfs exposed as a webdav resource
   * mapR - contains a closed source hdfs compatible file system that supports 
read/write NFS access
   * [[https://github.com/cloudera/hdfs-nfs-proxy|HDFS NFS Proxy]] - exports 
HDFS as NFS without use of fuse. Supports Kerberos and re-orders writes so they 
are written to hdfs sequentially.
+  * hadoofus - a FUSE implementation in C for hadoop 0.20.203 to 1.0.3
+  * native-hdfs-fuse - a FUSE implementation in C that supports random writes
  
  == Supported Operating Systems ==
  
@@ -38, +40 @@

  
  Supports reads, writes, and directory operations (e.g., cp, ls, more, cat, 
find, less, rm, mkdir, mv, rmdir).  Things like touch, chmod, chown, and 
permissions are in the works. Fuse-dfs currently shows all files as owned by 
nobody.
  
- == Contributing ==
+ === Contributing ===
  
  It's pretty straightforward to add functionality to fuse-dfs as fuse makes 
things relatively simple. Some other tasks require also augmenting libhdfs to 
expose more hdfs functionality to C. See 
[[http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310240&sorter/order=DESC&sorter/field=priority&resolution=-1&component=12312376|contrib/fuse-dfs
 JIRAs]]
  
- == Requirements ==
+ === Requirements ===
  
   * Hadoop with compiled libhdfs.so
   * Linux kernel > 2.6.9 with fuse, which is the default or Fuse 2.7.x, 2.8.x 
installed. See: [[http://fuse.sourceforge.net/]] or even easier: 
[[http://dag.wieers.com/rpm/packages/fuse/]]
@@ -63, +65 @@

  (probably the same for others too). See 
[[https://issues.apache.org/jira/browse/HADOOP-3344|HADOOP-3344]]
  
  Common build problems include not finding the libjvm.so in 
JAVA_HOME/jre/lib/OS_ARCH/server or not finding fuse in FUSE_HOME or /usr/local.
- 
  
  === CONFIGURING ===
  
@@ -104, +105 @@

  -oprotected=%s (a colon separated list of directories that fuse-dfs should 
not allow to be deleted or moved - e.g., /user:/tmp)
  -oprivate (not often used but means only the person who does the mount can 
use the filesystem - aka ! allow_others in fuse speak)
  -ordbuffer=%d (in KBs how large a buffer should fuse-dfs use when doing hdfs 
reads)
- ro 
+ ro
  rw
  -ousetrash (should fuse dfs throw things in /Trash when deleting them)
  -onotrash (opposite of usetrash)
@@ -140, +141 @@

  
  === KNOWN ISSUES ===
  
- 1. if you alias `ls` to `ls --color=auto` and try listing a directory with 
lots (over thousands) of files, expect it to be slow and at 10s of thousands, 
expect it to be very very slow.  This is because `--color=auto` causes ls to 
stat every file in the directory. Since fuse-dfs does not cache attribute 
entries when doing a readdir, 
+ 1. if you alias `ls` to `ls --color=auto` and try listing a directory with 
lots (over thousands) of files, expect it to be slow and at 10s of thousands, 
expect it to be very very slow.  This is because `--color=auto` causes ls to 
stat every file in the directory. Since fuse-dfs does not cache attribute 
entries when doing a readdir,
  this is very slow. see 
[[https://issues.apache.org/jira/browse/HADOOP-3797|HADOOP-3797]]
  
  2. Writes are approximately 33% slower than the DFSClient. TBD how to 
optimize this. see: 
[[https://issues.apache.org/jira/browse/HADOOP-3805|HADOOP-3805]] - try using 
-obig_writes if on a >2.6.26 kernel, should perform much better since bigger 
writes implies less context switching.
  
- 3. Reads are ~20-30% slower even with the read buffering. 
+ 3. Reads are ~20-30% slower even with the read buffering.
  
  
  == Fuse-j-HDFS ==
@@ -183, +184 @@

  
  https://github.com/brockn/hdfs-nfs-proxy
  
+ == Hadoofus ==
+ 
+ https://github.com/cemeyer/hadoofus
+ 
+ > The hadoofus project is an HDFS (Hadoop Distributed File System) client 
library. It is implemented in C and supports RPC pipelining and out-of-order 
execution.
+ > It provides a C API for directly calling Namenode RPCs and performing 
Datanode block read and write operations, as well as a libhdfs-compatible 
interface (libhdfs_hadoofus.so).
+ > It also includes a Python wrapper module, implemented in Cython.
+ > Note: This library currently supports the HDFS protocol as spoken by Apache 
Hadoop releases 0.20.203 through 1.0.3.
+ 
+ == native-hdfs-fuse ==
+ 
+ https://github.com/remis-thoughts/native-hdfs-fuse
+ 
+ > Unlike most other FUSE HDFS implementations this implementation doesn't use 
libhdfs or otherwise start a JVM - it constructs and sends the protocol buffer 
messages itself.
+ > The implementation supports random file writes too.
+

[Hadoop Wiki] Update of "MountableHDFS" by Remis

Reply via email to