>>>>> "Kent" == Kent Overstreet <[email protected]> writes:
> On Thu, Oct 24, 2024 at 02:30:34PM -0400, John Stoffel wrote: >> >>>>> "Kent" == Kent Overstreet <[email protected]> writes: >> >> > On Mon, Oct 21, 2024 at 10:18:57PM +0530, Manas via B4 Relay wrote: >> >> From: Manas <[email protected]> >> >> >> >> This reverts commit 60f2b1bcf519416dbffee219132aa949d0c39d0e. >> >> >> >> This syzbot bug[1] is triggered due to the BUG_ON assertions added in >> >> __bch2_dev_btree_bitmap_mark. During runtime, m->btree_bitmap_shift is >> >> 63 '?'. This triggers both the assertions. >> >> > The BUG_ON() doesn't need to be deleted; we just need to fix the >> > validation so it doesn't fire (it doesn't particularly matter if it's >> > removed or not, ubsan would catch it without the BUG_ON()). >> >> Shouldn't the BUG_ON() be replaced with making the filesystem readonly >> instead if you're going to keep it? I'd rather have the filesystem >> still be mounted and able to be read, but not writeable, instead of >> having my system crash before I can do anything. > Not in this case. In general, if there's a chance of the BUG_ON() > hitting in the wild after the code has passed testing then it > shouldn't be a BUG_ON(), but this is low level validation that we're > relying on. So I'm having a hard time parsing this reply. And I don't think you make a good case for leaving or even having a BUG_ON() here at all. If there's a chance of it hitting in the wild, it should be removed. But you don't want to remove it because it will never hit? That's just lazy... :-) I just don't see why a filesystem should have the opportunity to kill an entire system because that one filesystem has problems. > In general higher level code absolutely requires that the low level > validation is correct, because if it's not that will trigger all sorts > of undefined behaviour in the upper layers. > By "low level validation" I mean _only_ the validate functions that > check simple invariants within a single data type that is read or > written atomically to disk - "is data type garbage or not". Sure. > If it's an invariant that involves multiple objects - "does this > extent refer to a valid device/snapshot ID" - that's not something > we can check in validate, because then an extent or what have you > would become invalid depending on what happens in the rest of the > filesystem. Those sorts of checks do in fact need proper error > paths. Right.
