On Oct 26, 2010, at 11:25 AM, Hazem Mahmoud wrote:
> That raises a question that I am currently looking into and would appreciate
> any and all advice people have.
>
> We are replacing our current NetApp solution, which has served us well but we
> have outgrown it.
>
> I am looking at either upgrading to a bigger and meaner NetApp or possibly
> going with Hadoop (HDFS and Fuse ).
You'd probably better looking at something like Ceph or Lustre which
are meant to be fully POSIX compliant.
> I need to mount the "storage solution" (HDFS or SAN) to about 5 or 6 systems.
> I'm a little concerned about utilizing HDFS/Fuse for a couple of reasons:
> 1. Performance of Fuse (how does it compare to an iSCSI SAN solution for
> example)...i know, it probably depends on a lot of things, but just
> generally-speaking or any experiences anyone has had
FUSE in general (regardless of what you're using with it) is going to
be significantly slower vs. a kernel-level file system.
> 2. Security/permissions (owner of all files show up as "nobody"
I doubt anyone has spent any time adding security the HDFS FUSE port.
So even though NetApp's Kerberos stack is pretty crappy (3DES only...
seriously?) , you're going to get a better security model with it.
> Another question: Are there other options for mounting HDFS on these 5 or 6
> systems for pure filesystem access ? (using NFS, etc)
No. I keep hoping someone builds a pNFS/NFSv4.1 server on top of
Hadoop, but alas not yet.
>
> Thanks everyone!
>
> -Hazem
>
> On Oct 26, 2010, at 5:43 AM, Brian Bockelman wrote:
>
>> In general, unless you run newer kernels and versions of FUSE as that ticket
>> suggests, it is significantly slower in raw throughput.
>>
>> However, we generally don't have a day go by at my site where we don't push
>> FUSE over 30Gbps, as the bandwidth is spread throughout nodes.
>> Additionally, as we are limited by the latency of spinning disk and random
>> reads, we don't particularly hurt by going "only" 60MB/s on our nodes. If
>> we wanted to go faster, we use the native clients.
>>
>> Of course, if anyone wants to donate a lowly university 1.5PB of SSDs, I'm
>> all ears :)
>>
>> Brian
>>
>> On Oct 26, 2010, at 12:40 AM, Ted Yu wrote:
>>
>>> https://issues.apache.org/jira/browse/HADOOP-3805 tried to mitigate this
>>> problem.
>>>
>>> On Mon, Oct 25, 2010 at 10:17 PM, aniket ray <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm seeing in my experiments that Fuse-HDFS is significantly slower (around
>>>> 3x slower) than using the Java hdfs API directly.
>>>> Wanted to ask if this slowness the norm? Or is there something wrong with
>>>> my
>>>> configuration.
>>>> Also is this purely JNI slowness or is there something deeper to it?
>>>>
>>>>
>>>> My experiment is basically opening a file in write mode and calling writes
>>>> multiple times (close to 5GB data) to write to that file.
>>>>
>>>> Thanks for the help,
>>>> aniket ray
>>>>
>>
>