David Brown wrote:
On Sun, Dec 16, 2007 at 11:23:52AM -0800, Bob La Quey wrote:
On Dec 16, 2007 10:10 AM, David Brown <[EMAIL PROTECTED]> wrote:
I don't think it provides redundancy, at least not on a full-system
level
like RAID does.
Let's not let this misperception propagate. ZFS does provide redundancy.
See my previous reply to this thread.
Yes. I did correct this.
ZFS provides a functionality similar to RAID-5. Other parity flavors were
declared as "in the works" in Jeff's blog. I don't know what the state of
implementation was.
The redundancy is done within the filesystem, rather than below, hence
Andrew Morton's layering violation comments.
I'm beginning to wonder if this "laying violation" they've implemented is
actually a good idea. There seems to be almost a battle going on between
MD/LVM and filesystems over write barriers. The filesystems want write
barriers so they can be robust but still fast. But, without knowing where
the data is really going, they can't group things in a manner that MD/LVM
can do that barriers efficiently. Currently Linux doesn't allow write
barriers to MD/LVM, and the only real way to implement it would be to do a
full write synchronization across all of the devices involved in the array,
which defeats the benefit gained by having the write barrier anyway.
Well ...
There are a couple of things here:
First, ZFS starts checksumming *from the initial request*.
Consequently, you have knowledge about whether the transaction actually
committed correctly at all points.
Second, ZFS has lots of on-disk checking with copy-on-write semantics.
This means that your old data is never lost by a write, and your new
data relies on a read and whether the checksum validates to establish
that the transaction is done.
It's the combination of immediate checksum followed by copy-on-write
that gives you your "write barriers" without having real write barriers.
Yes, there is a performance issue--the read has to complete to force
the memory barrier. However, there is *always* some performance
implication when you have a "barrier" of any form.
Now, if the physical disks are doing something tricky like serving a
read from a cache without writing it, all bets are off (but that is no
different from any other filesystem).
Third, the "telescoping" argument is a religious argument. Some of us
want a storage layer that handles migrating, mapping, extending, etc.
without human interaction. Some of us don't.
The big problem here is that we really don't get a choice. It's a
matter of time. Hard read errors are coming given the size of files
we're using. It is simply going to be too difficult to cope with in any
way other than automatically. The filesystem is going to have to have
the ability to duplicate, migrate, and recopy data automatically to deal
with that.
This argument gets repeated every time some new technology is going to
abstract away a significant chunk of something which is currently a
manual task (compilers replacing assembly language, garbage collection
replacing manual management, preemptive multitasking OS's replacing
cooperative, etc.)
Eventually, ZFS (or something like it) will win.
However, I'm not sure this is solvable anyway, unless drive manufacturers
could come up with a way of getting write barriers to work between drives.
As I said, the immediate checksum combined with copy-on-write followed
by verification read solves it without barriers.
In addition, there is nothing in ZFS that says you can't have a very
fast, battery-backed storage area that ZFS commits to first and then
slowly replicates to other drives later. I don't believe that this is
implemented yet, however.
In my opinion, this will be the killer application that shoves ZFS into
everything. With the increasing importance of flash, the whole multiple
physical devices with very different read/write characteristics is going
to become a greater and greater issue.
-a
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list