Hello, > On Mon, 2005-04-04 at 10:04, Jan Kara wrote: > > > In log_do_checkpoint() we go through the t_checkpoint_list of a > > transaction and call __flush_buffer() on each buffer. Suppose there is > > just one buffer on the list and it is dirty. __flush_buffer() sees it and > > puts it to an array of buffers for flushing. Then the loop finishes, > > retry=0, drop_count=0, batch_count=1. So __flush_batch() is called - we > > drop all locks and sleep. While we are sleeping somebody else comes and > > makes the buffer dirty again (OK, that is not probable, but I think it > > could be possible). > > Yes, there's no reason why not at that point. > > > Now we wake up and call __cleanup_transaction(). > > It's not able to do anything and returns 0. > > I think the _right_ answer here is to have two separate checkpoint > lists: the current one, plus one for which the checkpoint write has > already been submitted. That way, we can wait for IO completion on > submitted writes without (a) getting conned into doing multiple rewrites > if there's somebody else dirtying the buffer; or (b) getting confused > about how much progress we're making. Buffers on the pre-write list get > written; buffers on the post-write list get waited for; and both count > as progress (eliminating the false assert-failure when we failed to > detect progress). Yes, this seems to be a better long-term solution. A hotfix (retrying after __flush_batch()) is attached if somebody is interested - it should be safe and is lightly tested.
> The prevention of multiple writes in this case should also improve > performance a little. > > That ought to be pretty straightforward, I think. The existing cases > where we remove buffers from a checkpoint shouldn't have to care about > which list_head we're removing from; those cases already handle buffers > in both states. It's only when doing the flush/wait that we have to > distinguish the two. Yes, AFAICS the changes should remain local to the checkpointing code (plus __unlink_buffer()). Should I write the patch or will you? Honza -- Jan Kara <[EMAIL PROTECTED]> SuSE CR Labs
Fix possible false assertion failure in JBD checkpointing code. Signed-off-by: Jan Kara <[EMAIL PROTECTED]> diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-rc2/fs/jbd/checkpoint.c linux-2.6.12-rc2-checkpoint/fs/jbd/checkpoint.c --- linux-2.6.12-rc2/fs/jbd/checkpoint.c 2005-03-03 18:58:29.000000000 +0100 +++ linux-2.6.12-rc2-checkpoint/fs/jbd/checkpoint.c 2005-04-05 13:26:42.000000000 +0200 @@ -339,8 +339,10 @@ int log_do_checkpoint(journal_t *journal } } while (jh != last_jh && !retry); - if (batch_count) + if (batch_count) { __flush_batch(journal, bhs, &batch_count); + retry = 1; + } /* * If someone cleaned up this transaction while we slept, we're