>>>>> "Kent" == Kent Overstreet <[email protected]> writes:

> On Thu, Oct 24, 2024 at 02:30:34PM -0400, John Stoffel wrote:
>> >>>>> "Kent" == Kent Overstreet <[email protected]> writes:
>> 
>> > On Mon, Oct 21, 2024 at 10:18:57PM +0530, Manas via B4 Relay wrote:
>> >> From: Manas <[email protected]>
>> >> 
>> >> This reverts commit 60f2b1bcf519416dbffee219132aa949d0c39d0e.
>> >> 
>> >> This syzbot bug[1] is triggered due to the BUG_ON assertions added in
>> >> __bch2_dev_btree_bitmap_mark. During runtime, m->btree_bitmap_shift is
>> >> 63 '?'. This triggers both the assertions.
>> 
>> > The BUG_ON() doesn't need to be deleted; we just need to fix the
>> > validation so it doesn't fire (it doesn't particularly matter if it's
>> > removed or not, ubsan would catch it without the BUG_ON()).
>> 
>> Shouldn't the BUG_ON() be replaced with making the filesystem readonly
>> instead if you're going to keep it?  I'd rather have the filesystem
>> still be mounted and able to be read, but not writeable, instead of
>> having my system crash before I can do anything.  

> Not in this case. In general, if there's a chance of the BUG_ON()
> hitting in the wild after the code has passed testing then it
> shouldn't be a BUG_ON(), but this is low level validation that we're
> relying on.

So I'm having a hard time parsing this reply.  And I don't think you
make a good case for leaving or even having a BUG_ON() here at all.
If there's a chance of it hitting in the wild, it should be removed.
But you don't want to remove it because it will never hit?  That's
just lazy... :-)   I just don't see why a filesystem should have the
opportunity to kill an entire system because that one filesystem has
problems.  

> In general higher level code absolutely requires that the low level
> validation is correct, because if it's not that will trigger all sorts
> of undefined behaviour in the upper layers.

> By "low level validation" I mean _only_ the validate functions that
> check simple invariants within a single data type that is read or
> written atomically to disk - "is data type garbage or not".

Sure.

> If it's an invariant that involves multiple objects - "does this
> extent refer to a valid device/snapshot ID" - that's not something
> we can check in validate, because then an extent or what have you
> would become invalid depending on what happens in the rest of the
> filesystem. Those sorts of checks do in fact need proper error
> paths.

Right. 

Reply via email to