Re: HDFS vs. CIFS

Ted Dunning Wed, 17 Oct 2007 09:33:17 -0700

I recommend you look at MogileFS as well.

The intended usage is very different from hadoop.  Hadoop is intended to
store fewer (<20 million) larger (preferably >10MB) files.  It provides
strong capabilities for splitting large files across machines and hooks for
disk local computation.


Mogile is intended for large numbers of smaller files that are intended to
be stored in toto in a conventional file system on a large collection of
hosts.  It provides no file splitting and no local computing hooks.  There
is a framework for running scripts on large collections of files.

Neither of these exposes a normal file system, but there is a Fuse
implementation for MogileFS and an early webDAV interface for HDFS.  Both of
these will be more mature over time.

Whether either is useful for you is up to you.

For that matter, you might look at KFS as well.  I know nothing about it,
but the authors seem to like it.  :-)

See http://blog.kosmix.com/2007/09/kosmos_filesystem_release.html

On 10/17/07 6:19 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> Fair enough...
> 
> I appreciate your reply and apologize for my misinterpretation of your
> intent.  Someone else pointed me to the architecture documentation which I
> shall peruse inorder to gain a better understanding of this product.
> 
> The primary feature that attracted me to Hadoop was the ability to maintain
> a single namespace across resources.  This will become increasingly
> important as we add logical volumes to our storage array, whether they be
> NetApps, DMX3, or commodity hardware (servers).  I have, up until I came
> across hadoop, been focusing primarily on CIFS and want to further
> investigate other distributed file systems in order to either rule them out
> or to further realize their capabilities and how they may apply to the
> problem at hand.
> 
> Thank you all for your replies.
> 
> Trevor Stewart
> Union Pacific Railroad
> 
> 
> 
> 
>                  
>              Ted Dunning
>              <[EMAIL PROTECTED]
>              m>                                                         To
>                                        <[email protected]>
>              10/16/2007 12:53                                           cc
>              PM  
>                                                                    Subject
>                                        Re: HDFS vs. CIFS
>              Please respond to
>              [EMAIL PROTECTED]
>                e.apache.org
>                  
>                  
>                  
> 
> 
> 
> 
> 
> 
> Apologies off-list.  That wasn't intended to be rude.
> 
> 
> On 10/16/07 10:46 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> 
>> Well then...color me humbled Mr. Dunning.
>> 
>> I apologize for monopolizing your quite obviously precious time.
>> 
>> BTW...I don't believe these questions are answered in the FAQ.
>> 
>> Thank you for making the open source experience SO enjoyable.
>> 
>> 
>> 
>> 
>> 
>>              Ted Dunning
>>              <[EMAIL PROTECTED]
>>              m>
> To
>>                                        <[email protected]>
>>              10/16/2007 12:32
> cc
>>              PM
>> 
> Subject
>>                                        Re: HDFS vs. CIFS
>>              Please respond to
>>              [EMAIL PROTECTED]
>>                e.apache.org
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> First, it is PETAbytes, not petRabytes.
>> 
>> Secondly, if you are committed to using NetApps or DMX3, then you really
>> don't need (or want HDFS).
>> 
>> Thirdly, if you are committed to using a distributed file store like HDFS
>> (or MogileFS or KFS), then you don't need NetApps.  Distributed file
>> systems
>> were designed exactly to eliminate the need for highly engineered storage
>> systems by allowing the use of entire redundant computers rather than
>> cleverly interconnected disks.
>> 
>> So you really have two classes of designs:
>> 
>> A) traditional big iron
>> 
>> B) trendy, but not entirely ready for prime time distributed file stores
>> like HDFS
>> 
>> The first option will probably work and will cost about 2x more (based on
>> my
>> experience, your mileage will vary).  The second option will require more
>> hand-holding and won't come with a support contract, but you would be
> able
>> to do some things with it that are impossible in a traditional sense.
>> 
>> 
>> My guess is that if you are still asking basic questions like this that
> are
>> answered in the FAQ, then you will be better off paying NetApp for
>> engineering time than building this system on your own.
>> 
>> 
>> On 10/16/07 8:52 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
>> 
>>> Hmmm...OK...
>>> 
>>> Let me explain my requirements here and see if you all can tell me if
>>> Hadoop provides the functionality I need.
>>> 
>>> I'm building a highly perfomant, highly available (no less than 4 9's),
>> raw
>>> storage subsystem.  It will be write once for the initial dataset
> (binary
>>> data) but will have the ability to maintain metadata associated to the
>>> binary data.  The metadata will be "queryiable"  and therefore indexed
>>> (want to use Lucene for this purpose).  It must have the ability to
> store
>>> petrabytes of data.  We will use either NetApps or DMX3 storage media.
>>> 
>>> Please discuss...
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>              "Joydeep Sen
>>>              Sarma"
>>>              <[EMAIL PROTECTED]
>> To
>>>              .com>                     <[email protected]>
>>> 
>> cc
>>>              10/15/2007 05:20
>>>              PM
>> Subject
>>>                                        RE: HDFS vs. CIFS
>>> 
>>>              Please respond to
>>>              [EMAIL PROTECTED]
>>>                e.apache.org
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Not a valid comparison. CIFS is a remote file access protocol only. HDFS
>>> is a file system (that comes bundled with a remote file access
>>> protocol).
>>> 
>>> It may be possible to build a CIFS gateway for HDFS.
>>> 
>>> One interesting point of comparison at the protocol level is the level
>>> of parallelism. Compared to HDFS protocol - CIFS exposes less
>>> parallelism. DFS/CIFS has the concept of junction points that allows
>>> directories from different storage servers to be stitched into one
>>> namespace. There are commercial products that make this easy. However -
>>> this allows parallelism at directory level only - whereas HDFS protocol
>>> allows a single file to be distributed across different servers.
>>> 
>>> (And as was pointed out - CIFS supports many other file system
>>> operations - ACLs, oplocks and what not that HDFS doesn't).
>>> 
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>>> Sent: Monday, October 15, 2007 12:24 PM
>>> To: [email protected]
>>> Subject: HDFS vs. CIFS
>>> 
>>> 
>>> I would like someone to compare and contrast CIFS and HDFS?  Or...if
>>> that
>>> is not a valid comparison...please explain to me why it's not a valid
>>> comparison.
>>> 
>>> Thanks,
>>> Trevor
>>> 
>>> .
>>> This message and any attachments contain information from Union Pacific
>>> which may be confidential and/or privileged.
>>> If you are not the intended recipient, be aware that any disclosure,
>>> copying, distribution or use of the contents of this message is strictly
>>> prohibited by law. If you receive this message in error, please contact
>>> the sender immediately and delete the message and any attachments.
>>> 
>>> 
>>> 
>>> 
>>> .
>>> This message and any attachments contain information from Union Pacific
>> which
>>> may be confidential and/or privileged.
>>> If you are not the intended recipient, be aware that any disclosure,
>> copying,
>>> distribution or use of the contents of this message is strictly
>> prohibited by
>>> law. If you receive this message in error, please contact the sender
>>> immediately and delete the message and any attachments.
>>> 
>> 
>> 
>> 
>> 
>> .
>> This message and any attachments contain information from Union Pacific
> which
>> may be confidential and/or privileged.
>> If you are not the intended recipient, be aware that any disclosure,
> copying,
>> distribution or use of the contents of this message is strictly
> prohibited by
>> law. If you receive this message in error, please contact the sender
>> immediately and delete the message and any attachments.
>> 
> 
> 
> 
> 
> .                
> This message and any attachments contain information from Union Pacific which
> may be confidential and/or privileged.
> If you are not the intended recipient, be aware that any disclosure, copying,
> distribution or use of the contents of this message is strictly prohibited by
> law. If you receive this message in error, please contact the sender
> immediately and delete the message and any attachments.
>

Re: HDFS vs. CIFS

Reply via email to