On Mar 18, 2025, at 22:57, Kent Overstreet <[email protected]> wrote:
> 
> On Tue, Mar 18, 2025 at 10:26:03AM -0400, John Stoffel wrote:
>>>>>>> "Kent" == Kent Overstreet <[email protected]> writes:
>> 
>>> On Mon, Mar 17, 2025 at 04:58:26PM -0400, John Stoffel wrote:
>>>>>>>>> "Alan" == Alan Huang <[email protected]> writes:
>>>> 
>>>>> Now there are 16 journal buffers, 8 is too small to be enough.
>>>>> Signed-off-by: Alan Huang <[email protected]>
>>>>> ---
>>>>> fs/bcachefs/recovery.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>>> diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
>>>>> index 71c786cdb192..a6e26733854d 100644
>>>>> --- a/fs/bcachefs/recovery.c
>>>>> +++ b/fs/bcachefs/recovery.c
>>>>> @@ -899,7 +899,7 @@ int bch2_fs_recovery(struct bch_fs *c)
>>>>> * journal sequence numbers:
>>>>> */
>>>>> if (!c->sb.clean)
>>>>> - journal_seq += 8;
>>>>> + journal_seq += JOURNAL_BUF_NR * 4;
>>>> 
>>>> Instead of magic numbers, could you put in a define with an
>>>> explanation of how you arrived at this number?  Just to document the
>>>> assumptions better?
>>>> 
>>>> John
>> 
>>> The * 4 is a fudge factor.
>> 
>> Ok.
>> 
>>> But actually, I was giving this more thought and I don't think we have
>>> the correct number.
>> 
>> We have a WAG here.  :-)
>> 
>>> The real bound is "number of unflushed journal entries that might have
>>> been allocated, and have other items (btree nodes) referring to that
>>> sequence number, but which don't hit because beacuse they weren't
>>> flushed".
>> 
>>> And we don't have an actual bound on that.
>> 
>> So what happens if journal_seq overflows?  I don't know the code and
>> haven't looked.   
> 
> In the olden days, in the before times, blacklisting insufficient
> sequence numbers would mean btree node entries could become visible
> after a crash that should've been ignored, because they were newer than
> the newest (flush) journal entry.
> 
> That can't happen anymore because pointers to btree nodes now indicate
> how many sectors we've written and completed within that node, and
> they're updated after every btree node write (log append).
> 
> IOW - our mechanism for sequential consistency now is that the b-tree
> behaves like a pure COW btree.
> 
> So I retract what I said before :) Having just checked the journal read
> path, and the recovery path that creates the blacklist entries, I think
> we're good.
> 
> The only reason we'd need to blacklist more is if we want some kind of a
> double check on the modern sequential consistency mechanism, and I don't
> think we need that.

What about multi-btree sequential consistency ?

i.e. one btree completes the entire tree update, but the other doesn’t




Reply via email to