On Mar 18, 2025, at 22:57, Kent Overstreet <[email protected]> wrote: > > On Tue, Mar 18, 2025 at 10:26:03AM -0400, John Stoffel wrote: >>>>>>> "Kent" == Kent Overstreet <[email protected]> writes: >> >>> On Mon, Mar 17, 2025 at 04:58:26PM -0400, John Stoffel wrote: >>>>>>>>> "Alan" == Alan Huang <[email protected]> writes: >>>> >>>>> Now there are 16 journal buffers, 8 is too small to be enough. >>>>> Signed-off-by: Alan Huang <[email protected]> >>>>> --- >>>>> fs/bcachefs/recovery.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>>> diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c >>>>> index 71c786cdb192..a6e26733854d 100644 >>>>> --- a/fs/bcachefs/recovery.c >>>>> +++ b/fs/bcachefs/recovery.c >>>>> @@ -899,7 +899,7 @@ int bch2_fs_recovery(struct bch_fs *c) >>>>> * journal sequence numbers: >>>>> */ >>>>> if (!c->sb.clean) >>>>> - journal_seq += 8; >>>>> + journal_seq += JOURNAL_BUF_NR * 4; >>>> >>>> Instead of magic numbers, could you put in a define with an >>>> explanation of how you arrived at this number? Just to document the >>>> assumptions better? >>>> >>>> John >> >>> The * 4 is a fudge factor. >> >> Ok. >> >>> But actually, I was giving this more thought and I don't think we have >>> the correct number. >> >> We have a WAG here. :-) >> >>> The real bound is "number of unflushed journal entries that might have >>> been allocated, and have other items (btree nodes) referring to that >>> sequence number, but which don't hit because beacuse they weren't >>> flushed". >> >>> And we don't have an actual bound on that. >> >> So what happens if journal_seq overflows? I don't know the code and >> haven't looked. > > In the olden days, in the before times, blacklisting insufficient > sequence numbers would mean btree node entries could become visible > after a crash that should've been ignored, because they were newer than > the newest (flush) journal entry. > > That can't happen anymore because pointers to btree nodes now indicate > how many sectors we've written and completed within that node, and > they're updated after every btree node write (log append). > > IOW - our mechanism for sequential consistency now is that the b-tree > behaves like a pure COW btree. > > So I retract what I said before :) Having just checked the journal read > path, and the recovery path that creates the blacklist entries, I think > we're good. > > The only reason we'd need to blacklist more is if we want some kind of a > double check on the modern sequential consistency mechanism, and I don't > think we need that.
What about multi-btree sequential consistency ? i.e. one btree completes the entire tree update, but the other doesn’t
