Re: Why use RAID anymore?

Andrew Lentvorski Sun, 16 Dec 2007 14:18:24 -0800

David Brown wrote:

On Sun, Dec 16, 2007 at 11:23:52AM -0800, Bob La Quey wrote:

On Dec 16, 2007 10:10 AM, David Brown <[EMAIL PROTECTED]> wrote:

I don't think it provides redundancy, at least not on a full-systemlevel
like RAID does.


Let's not let this misperception propagate. ZFS does provide redundancy.
See my previous reply to this thread.


Yes.  I did correct this.

ZFS provides a functionality similar to RAID-5.  Other parity flavors were
declared as "in the works" in Jeff's blog.  I don't know what the state of
implementation was.

The redundancy is done within the filesystem, rather than below, hence
Andrew Morton's layering violation comments.

I'm beginning to wonder if this "laying violation" they've implemented is
actually a good idea.  There seems to be almost a battle going on between
MD/LVM and filesystems over write barriers.  The filesystems want write
barriers so they can be robust but still fast.  But, without knowing where
the data is really going, they can't group things in a manner that MD/LVM
can do that barriers efficiently.  Currently Linux doesn't allow write
barriers to MD/LVM, and the only real way to implement it would be to do a
full write synchronization across all of the devices involved in the array,
which defeats the benefit gained by having the write barrier anyway.


Well ...

There are a couple of things here:

First, ZFS starts checksumming *from the initial request*.Consequently, you have knowledge about whether the transaction actuallycommitted correctly at all points.

Second, ZFS has lots of on-disk checking with copy-on-write semantics.This means that your old data is never lost by a write, and your newdata relies on a read and whether the checksum validates to establishthat the transaction is done.

It's the combination of immediate checksum followed by copy-on-writethat gives you your "write barriers" without having real write barriers.Yes, there is a performance issue--the read has to complete to forcethe memory barrier. However, there is *always* some performanceimplication when you have a "barrier" of any form.

Now, if the physical disks are doing something tricky like serving aread from a cache without writing it, all bets are off (but that is nodifferent from any other filesystem).

Third, the "telescoping" argument is a religious argument. Some of uswant a storage layer that handles migrating, mapping, extending, etc.without human interaction. Some of us don't.

The big problem here is that we really don't get a choice. It's amatter of time. Hard read errors are coming given the size of fileswe're using. It is simply going to be too difficult to cope with in anyway other than automatically. The filesystem is going to have to havethe ability to duplicate, migrate, and recopy data automatically to dealwith that.

This argument gets repeated every time some new technology is going toabstract away a significant chunk of something which is currently amanual task (compilers replacing assembly language, garbage collectionreplacing manual management, preemptive multitasking OS's replacingcooperative, etc.)


Eventually, ZFS (or something like it) will win.

However, I'm not sure this is solvable anyway, unless drive manufacturers
could come up with a way of getting write barriers to work between drives.

As I said, the immediate checksum combined with copy-on-write followedby verification read solves it without barriers.

In addition, there is nothing in ZFS that says you can't have a veryfast, battery-backed storage area that ZFS commits to first and thenslowly replicates to other drives later. I don't believe that this isimplemented yet, however.

In my opinion, this will be the killer application that shoves ZFS intoeverything. With the increasing importance of flash, the whole multiplephysical devices with very different read/write characteristics is goingto become a greater and greater issue.


-a


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Re: Why use RAID anymore?

Reply via email to