On Tue, Jun 17, 2025 at 10:04 PM Kent Overstreet <[email protected]> wrote: > > On Tue, Jun 17, 2025 at 09:41:20PM +0800, Julian Sun wrote: > > Recently, syzkaller reported the following issue: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > > Call Trace: > > <TASK> > > mempool_alloc_noprof+0x1a7/0x510 mm/mempool.c:402 > > bch2_btree_update_start+0x549/0x1480 > > fs/bcachefs/btree_update_interior.c:1194 > > bch2_btree_node_rewrite+0x17e/0x1120 > > fs/bcachefs/btree_update_interior.c:2208 > > bch2_move_btree+0x6f0/0xc70 fs/bcachefs/move.c:1093 > > bch2_scan_old_btree_nodes+0x95/0x240 fs/bcachefs/move.c:1215 > > bch2_data_job+0x646/0x910 fs/bcachefs/move.c:1354 > > bch2_data_thread+0x8f/0x1d0 fs/bcachefs/chardev.c:315 > > kthread+0x711/0x8a0 kernel/kthread.c:464 > > ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > > > > This is because after commit d4d71b58e513 ("bcachefs: RO mounts now use > > less memory"), > > read-only mounts no longer initialize btree_interior_update_pool, which is > > required for > > processing BCH_IOCTL_DATA requests. > > Alan already gave me a better fix for this. You pretty much never want > to just check if the filesystem is ro or rw - that would be racy, that > can change at any time. If you need the filesystem to be rw, you do it > by getting a write ref (which may fail). > > Just checking SB_RDONLY here would be "technically" correct since we > only need the mempool, which is's never deallocated until filesystem > teardown, and the interior update path should get its own ref on > c->writes before doing anything serious. > > But it's bad form, because then other code changes might go "ok, we've > checked that we're RW, we're safe" - but we're actually not. > > And, I'm just now noticing that bch2_btree_update_start() actually does > not get a ref on c->writes, so we might want to fix that - or move.c > needs to be getting a write ref, or both. > > c->writes is a percpu refcount, so it's dirt cheap, there's generally > zero downside to taking a ref even if an upper layer already has one. > The only exception is if it's an internal operation that needs to run > when we're going RO - but we have a flag for that, > BCH_TRANS_COMMIT_no_check_rw, which bch2_btree_update_start() can check. > > The other consideration with write refs is that we don't want to be > holding them for an unbounded duration, because that will block going RO > - so I think bch2_ioctl_data() actually wasn't the best place for this, > we should be checking if we're RW in move.c, every time we kick off an > op.
Thanks for your detailed explanation, this makes sense to me. -- Julian Sun <[email protected]>
