Re: FUSE HDFS significantly slower

Allen Wittenauer Tue, 26 Oct 2010 11:36:59 -0700

On Oct 26, 2010, at 11:25 AM, Hazem Mahmoud wrote:

> That raises a question that I am currently looking into and would appreciate 
> any and all advice people have.
> 
> We are replacing our current NetApp solution, which has served us well but we 
> have outgrown it.
> 
> I am looking at either upgrading to a bigger and meaner NetApp or possibly 
> going with Hadoop (HDFS and Fuse ).


        You'd probably better looking at something like Ceph or Lustre which 
are meant to be fully POSIX compliant.  

> I need to mount the "storage solution" (HDFS or SAN) to about 5 or 6 systems. 
> I'm a little concerned about utilizing HDFS/Fuse for a couple of reasons:
> 1. Performance of Fuse (how does it compare to an iSCSI SAN solution for 
> example)...i know, it probably depends on a lot of things, but just 
> generally-speaking or any experiences anyone has had

        FUSE in general (regardless of what you're using with it) is going to 
be significantly slower vs. a kernel-level file system.


> 2. Security/permissions (owner of all files show up as "nobody"

        I doubt anyone has spent any time adding security the HDFS FUSE port.  
So even though NetApp's Kerberos stack is pretty crappy (3DES only... 
seriously?) , you're going to get a better security model with it.

> Another question: Are there other options for mounting HDFS on these 5 or 6 
> systems for pure filesystem access ? (using NFS, etc)

        No.  I keep hoping someone builds a pNFS/NFSv4.1 server on top of 
Hadoop, but alas not yet.

> 
> Thanks everyone!
> 
> -Hazem
> 
> On Oct 26, 2010, at 5:43 AM, Brian Bockelman wrote:
> 
>> In general, unless you run newer kernels and versions of FUSE as that ticket 
>> suggests, it is significantly slower in raw throughput.
>> 
>> However, we generally don't have a day go by at my site where we don't push 
>> FUSE over 30Gbps, as the bandwidth is spread throughout nodes.  
>> Additionally, as we are limited by the latency of spinning disk and random 
>> reads, we don't particularly hurt by going "only" 60MB/s on our nodes.  If 
>> we wanted to go faster, we use the native clients.
>> 
>> Of course, if anyone wants to donate a lowly university 1.5PB of SSDs, I'm 
>> all ears :)
>> 
>> Brian
>> 
>> On Oct 26, 2010, at 12:40 AM, Ted Yu wrote:
>> 
>>> https://issues.apache.org/jira/browse/HADOOP-3805 tried to mitigate this
>>> problem.
>>> 
>>> On Mon, Oct 25, 2010 at 10:17 PM, aniket ray <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm seeing in my experiments that Fuse-HDFS is significantly slower (around
>>>> 3x slower) than using the Java hdfs API directly.
>>>> Wanted to ask if this slowness the norm? Or is there something wrong with
>>>> my
>>>> configuration.
>>>> Also is this purely JNI slowness or is there something deeper to it?
>>>> 
>>>> 
>>>> My experiment is basically opening a file in write mode and calling writes
>>>> multiple times  (close to 5GB data) to write to that file.
>>>> 
>>>> Thanks for the help,
>>>> aniket ray
>>>> 
>> 
>

Re: FUSE HDFS significantly slower

Reply via email to