--On 10/06/00 04:29:04 +0200 Daniel Phillips <[EMAIL PROTECTED]> wrote:
> Chris Mason wrote:
>> --On 10/05/00 13:49:31 +0200 Daniel Phillips wrote:
>> > Chris Mason wrote:
>> >>
>> >> For the most part, reiserfs can play nice with bdflush. I give it
>> >> blocks when I've decided they are ready to get to disk, and I keep
>> >> blocks away from it when they aren't allowed to be written.
>> >
>> > But why not give them straight to ll_rw_block?
>>
>> Because I don't want them sent to disk yet ;-) Let them age a while in
>> the bdflush dirty list first.
>
> I'm just trying to get it straight. You *can* write them now, but
> you're not necessarily in a big hurry to, somebody might be able to
> write them again if they hang around, and the VM can age them instead of
> you, right?
>
Yes, or they might be logged again. The main reason writeahead logging
doesn't slow things down as much as you expect it to is because frequently
logged blocks end up only being written to the log.
> In Tux2's case nobody is allowed to write to a dirty buffer once it has
> entered the recording phase (otherwise you would contaminate the
> recorded tree) so it doesn't make sense to do anything else than feed it
> straight to ll_rw_block.
>
Makes sense.
[ ... ]
>> > I/O ordering constraints are complex for journalling filesystems,
>> > simple for Tux2. Tux2 blocks are always partitioned into two groups,
>> > plus two metaroots for ordering purposes, and the relationship is
>> > simple: write all of the first group; then its metaroot; let the
>> > second group become the first group; wait for a new second group to
>> > appear; repeat as necessary. No outside mechanism is needed to assist
>> > this.
>>
>> Do you have to wait for the metaroot to reach disk before you can allow
>> the second group to become the first group?
>
> Yes, assuming you mean:
>
> first group <= branching phase
> second group <= recording phase
>
> So I have the priorty ordering:
>
> blocks(i) -> root(i) -> blocks(i+1) -> root(i+1) -> etc
>
> And it would be possible to compress that slightly to:
>
> root(i-1) + blocks(i) -> root(i) + blocks(i+1) -> etc
>
Then the io borders would benefit you as well. Anywhere you wait_on_buffer
because that buffer has to hit disk before you can proceed is a performance
hit. It won't fix many book keeping problems, but it will make it easier
to keep things flowing to disk.
[ benefits of current fsync_dev ordering ]
> Exactly what I was thinking: this situation is the result of fsync_dev
> and friends imposing their dumb-filesystem-friendly view of the world on
> everybody. Wouldn't it be *way* better to start a kernel thread for
> each sb:
>
> for each sb: kernel_thread (sb->sync, sb, threadflags);
>
> And then wait for the threads to complete? Lots of things would be
> cleaner then, and one *big* thing would happen: VFS can detect and
> handle a stuck filesystem intelligently for a change.
>
Ok, as long as we understand why the current method is good, we can talk
about ways to make it better ;-) Instead of forking a thread for each
sync, I would rather allow the FS to start a thread on mount, and use that
for syncing. Either way, it is something that could be experimented with,
to see if the cost/complexity is worth it.
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]