Re: [RFC] api for consistent lvm snapshots

Chris Mason Thu, 05 Oct 2000 08:15:47 -0700


--On 10/05/00 13:49:31 +0200 Daniel Phillips
<[EMAIL PROTECTED]> wrote:

> Chris Mason wrote:
>> 
>> For the most part, reiserfs can play nice with bdflush.  I give it blocks
>> when I've decided they are ready to get to disk, and I keep blocks away
>> from it when they aren't allowed to be written.
> 
> But why not give them straight to ll_rw_block?  

Because I don't want them sent to disk yet ;-)  Let them age a while in the
bdflush dirty list first.

> Maybe the real question
> is, where does the elevator scheduling happen, in ll_rw_block or in
> bdflush?  I haven't checked.
> 

bdflush/kupdate decides when to send things to the elevator, the elevator
queues, merges, and sorts them for the disk.

>> There have been threads on i/o ordering recently, and that would really
>> clean things up.  Stephen, I'm assuming you have io ordering in mind for
>> your queue of 2.5 changes, I'm more than willing to help code something.
> 
> I/O ordering constraints are complex for journalling filesystems, simple
> for Tux2.  Tux2 blocks are always partitioned into two groups, plus two
> metaroots for ordering purposes, and the relationship is simple:  write
> all of the first group; then its metaroot; let the second group become
> the first group; wait for a new second group to appear; repeat as
> necessary.  No outside mechanism is needed to assist this.
> 

Do you have to wait for the metaroot to reach disk before you can allow the
second group to become the first group?

>> > What we need is a sensible method/callback/library arrangement for the
>> > sync like we now have for read/write/mmap.  What we have now is far
>> > from sensible.  Syncing should be done one superblock at a time, not
>> > across the entire system like it is now.  IOW, it's currently sliced
>> > horizonally while it really needs to be sliced vertically.  We need
>> > need a sync_filesystem method and it should default to a
>> > generic_sync_super that does the current dumb sync.  You should then
>> > put your improvements in as a method override, not just make the
>> > current messy arrangement even messier.
>> 
>> I don't entirely disagree, but reiserfs could actually sync slower if it
>> was done an FS at a time.  write_super will commit the current
>> transaction, which will dirty a whole bunch of metadata buffers for
>> writing.  So, by calling write_super on every FS first, you have the
>> chance to make better use of the underlying devices.
> 
> I don't see how you make better use of anything.  

Which is better:

ll_rw_block(buffer1) ;
some things that can schedule/take a long time
ll_rw_block(buffer2) ;
more things that can schedule/take a long time
ll_rw_block(buffer3) ;
etc.

Or:

ll_rw_block(buffer1) ;
ll_rw_block(buffer2) ;
ll_rw_block(buffer3) ;

things that schedule/take a long time

For reiserfs, the current code for fsync_dev and friends will result in the
second example, and doing the whole sync operation an FS at a time will
result in the first one.  The current code should perform better all
around, since it will tend to keep the dirty lists filled.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
Re: [RFC] api for consistent lvm snapshots

Reply via email to