Re: [RFC] api for consistent lvm snapshots

Chris Mason Wed, 04 Oct 2000 14:21:55 -0700


--On 10/04/00 16:32:00 +0200 Daniel Phillips
<[EMAIL PROTECTED]> wrote:

> Chris Mason wrote:

>> [ why are there two calls to sync_supers ]
>> 
>> The first one is a call to sync_supers, which will skip the call to the
>> FS write_super method if the super is not dirty.  The second is a call to
>> sync_supers_lockfs, which will always call the FS write_super_lockfs
>> method.
> 
> So if the superblock *is* dirty it will be written twice?
> 

Only if you choose to write the super in write_super_lockfs.

> [...]
>> It could have been done to avoid the first call to sync_supers, but this
>> is not a speed sensitive function, and I wanted to make it really clear
>> this was fsync_dev + something extra (if the FS provided it).
> 
> In Tux2 write_super doesn't make any sense until after the dirty dcache
> entries are flushed and the dirty inodes are in turn flushed.  In fact,
> from Tux2's point of view, the whole oddly constructed flush mechanism
> that currently exists doesn't help - it just gets in the way.  For
> example, fsync will try to write all dirty buffers, when it violates
> priority ordering constraints.  I simply disable that 'feature' - it's
> death for filesystem integrity.  For me, bdflush and friends are just a
> menace that has to be held at bay.  I don't see why this would be
> different for any other filesystem that pretends to close all windows of
> crash vulnerability, including yours.
> 

For the most part, reiserfs can play nice with bdflush.  I give it blocks
when I've decided they are ready to get to disk, and I keep blocks away
from it when they aren't allowed to be written.  Exactly how the blocks are
exchanged could improve, but bdflush as the flushing worker thread can be
useful.  Especially in 2.4, it is pretty easy to maintain your own
lists/kiobufs/whatever if you want to do all the work yourself.

There have been threads on i/o ordering recently, and that would really
clean things up.  Stephen, I'm assuming you have io ordering in mind for
your queue of 2.5 changes, I'm more than willing to help code something.

> What we need is a sensible method/callback/library arrangement for the
> sync like we now have for read/write/mmap.  What we have now is far from
> sensible.  Syncing should be done one superblock at a time, not across
> the entire system like it is now.  IOW, it's currently sliced
> horizonally while it really needs to be sliced vertically.  We need need
> a sync_filesystem method and it should default to a generic_sync_super
> that does the current dumb sync.  You should then put your improvements
> in as a method override, not just make the current messy arrangement
> even messier.
> 

I don't entirely disagree, but reiserfs could actually sync slower if it
was done an FS at a time.  write_super will commit the current transaction,
which will dirty a whole bunch of metadata buffers for writing.  So, by
calling write_super on every FS first, you have the chance to make better
use of the underlying devices.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
Re: [RFC] api for consistent lvm snapshots

Reply via email to