On Wed, Feb 05, 2025 at 06:56:19AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 04, 2025 at 04:34:18PM -0800, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    0de63bb7d919 Merge tag 'pull-fix' of git://git.kernel.org/..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=10faf5f8580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=1909f2f0d8e641ce
> > dashboard link: https://syzkaller.appspot.com/bug?extid=80e5d6f453f14a53383a
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for 
> > Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16b69d18580000
> > 
> > Downloadable assets:
> > disk image (non-bootable): 
> > https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-0de63bb7.raw.xz
> > vmlinux: 
> > https://storage.googleapis.com/syzbot-assets/1142009a30a7/vmlinux-0de63bb7.xz
> > kernel image: 
> > https://storage.googleapis.com/syzbot-assets/5d9e46a8998d/bzImage-0de63bb7.xz
> > mounted in repro: 
> > https://storage.googleapis.com/syzbot-assets/526692501242/mount_0.gz
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> > 
> >  slab radix_tree_node start ffff88803bf382c0 pointer offset 24 size 576
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > #PF: supervisor instruction fetch in kernel mode
> > #PF: error_code(0x0010) - not-present page
> > PGD 0 P4D 0 
> > Oops: Oops: 0010 [#1] PREEMPT SMP KASAN NOPTI
> > CPU: 0 UID: 0 PID: 5705 Comm: syz-executor Not tainted 
> > 6.14.0-rc1-syzkaller-00020-g0de63bb7d919 #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > RIP: 0010:0x0
> > Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> > RSP: 0018:ffffc90000007bd8 EFLAGS: 00010246
> > RAX: dffffc0000000000 RBX: 1ffff110077e705c RCX: 23438dd059a4b100
> > RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff88803bf382d8
> > RBP: ffffc90000007e10 R08: ffffffff819f146c R09: 1ffff11003f8519a
> > R10: dffffc0000000000 R11: 0000000000000000 R12: ffffffff81a6d507
> > R13: ffff88803bf382e0 R14: 0000000000000000 R15: ffff88803bf382d8
> > FS:  0000555567992500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffffffffffffffd6 CR3: 000000004da38000 CR4: 0000000000352ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <IRQ>
> >  rcu_do_batch kernel/rcu/tree.c:2546 [inline]
> 
> The usual way that this happens is that someone clobbers the rcu_head
> structure of something that has been passed to call_rcu().  The most
> popular way of clobbering this structure is to pass the same something to
> call_rcu() twice in a row, but other creative arrangements are possible.
> 
> Building your kernel with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y can usually
> spot invoking call_rcu() twice in a row.

I don't think it's that - syzbot's .config already has that enabled.
KASAN, too.

And the only place we do call_rcu() is from rcu_pending.c, where we've
got a rearming rcu callback - but we track whether it's outstanding, and
we do all relevant operations with a lock held.

And we only use rcu_pending.c with SRCU, not regular RCU.

We do use kfree_rcu() in a few places (all boring, I expect), but that
doesn't (generally?) use the rcu callback list.

So I'm not sure this is even a bcachefs bug.

Reply via email to