Response below

Van: Konstantin Belousov <kostik...@gmail.com>
Datum: zondag, 12 november 2023 19:47
Aan: Alexander Motin <m...@freebsd.org>
CC: Ronald Klop <ronald-li...@klop.ws>, curr...@freebsd.org
Onderwerp: Re: crash zfs_clone_range()

On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote:
> Hi Ronald,
>
> As I can see, the clone request to ZFS came through nullfs, and it crashed
> immediately on enter.  I've never been a VFS layer expert, but to me it may
> be a nullfs problem, not zfs.  Is there chance you was (un-)mounting
> something when this happened?
It is not nullfs issue, I believe, but the lack of the busy reference on the
upper mount.  I think https://reviews.freebsd.org/D42554 should cover it.

>
> On 10.11.2023 05:12, Ronald Klop wrote:
> > Hi,
> >
> > Had this crash today on RPI4/15-CURRENT.
> >
> > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19
> > main-b0203aaa46-dirty: Sat Nov  4 11:48:33 CET 2023     
ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG
> > arm64
> >
> > $ sysctl -a | grep bclon
> > vfs.zfs.bclone_enabled: 1
> >
> > I started a jail with poudriere to build a package. The jail uses null
> > mounts over ZFS.
> >
> > [root]# cu -s 115200 -l /dev/cuaU0
> > Connected
> >
> > db> bt
> > Tracing pid 95213 tid 100438 td 0xffff0000e1e97900
> > db_trace_self() at db_trace_self
> > db_stack_trace() at db_stack_trace+0x120
> > db_command() at db_command+0x2e4
> > db_command_loop() at db_command_loop+0x58
> > db_trap() at db_trap+0x100
> > kdb_trap() at kdb_trap+0x334
> > handle_el1h_sync() at handle_el1h_sync+0x18
> > --- exception, esr 0xf2000000
> > kdb_enter() at kdb_enter+0x48
> > vpanic() at vpanic+0x1dc
> > panic() at panic+0x48
> > data_abort() at data_abort+0x2fc
> > handle_el1h_sync() at handle_el1h_sync+0x18
> > --- exception, esr 0x96000004
> > rms_rlock() at rms_rlock+0x1c
> > zfs_clone_range() at zfs_clone_range+0x68
> > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> > null_bypass() at null_bypass+0x118
> > vn_copy_file_range() at vn_copy_file_range+0x18c
> > kern_copy_file_range() at kern_copy_file_range+0x36c
> > sys_copy_file_range() at sys_copy_file_range+0x8c
> > do_el0_sync() at do_el0_sync+0x634
> > handle_el0_sync() at handle_el0_sync+0x48
> > --- exception, esr 0x56000000
> >
> >
> > Oh.. While typing this I rebooted the machine and it happened again. I
> > didn't start anything in particular although the machine runs some
> > jails.
> >
> > x0: 0x00000000000000e0
> >    x1: 0xffffa00090317a48
> >    x2: 0xffffa000f79d4f00
> >    x3: 0xffffa000c61a44a8
> >    x4: 0xffff0000deefe460 ($d.2 + 0xdd776560)
> >    x5: 0xffffa001250e4c00
> >    x6: 0xffff0000e54025b5 ($d.5 + 0xc)
> >    x7: 0x000000000000030a
> >    x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100)
> >    x9: 0x0000000000000001
> >   x10: 0x0000000000000000
> >   x11: 0x0000000000000001
> >   x12: 0x0000000000000002
> >   x13: 0x0000000000000000
> >   x14: 0x0000000000000001
> >   x15: 0x0000000000000000
> >   x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310)
> >   x17: 0xffff0000004e0d44 (rms_rlock + 0x0)
> >   x18: 0xffff0000deefe280 ($d.2 + 0xdd776380)
> >   x19: 0x0000000000000000
> >   x20: 0xffff0000deefe460 ($d.2 + 0xdd776560)
> >   x21: 0x7fffffffffffffff
> >   x22: 0xffffa00090317a48
> >   x23: 0xffffa000f79d4f00
> >   x24: 0xffffa001067ef910
> >   x25: 0x00000000000000e0
> >   x26: 0xffffa000158a8000
> >   x27: 0x0000000000000000
> >   x28: 0xffffa000158a8000
> >   x29: 0xffff0000deefe280 ($d.2 + 0xdd776380)
> >    sp: 0xffff0000deefe280
> >    lr: 0xffff000001623564 (zfs_clone_range + 0x6c)
> >   elr: 0xffff0000004e0d60 (rms_rlock + 0x1c)
> > spsr: 0x00000000a0000045
> >   far: 0x0000000000000108
> >   esr: 0x0000000096000004
> > panic: data abort in critical section or under mutex
> > cpuid = 1
> > time = 1699610885
> > KDB: stack backtrace:
> > db_trace_self() at db_trace_self
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> > vpanic() at vpanic+0x1a0
> > panic() at panic+0x48
> > data_abort() at data_abort+0x2fc
> > handle_el1h_sync() at handle_el1h_sync+0x18
> > --- exception, esr 0x96000004
> > rms_rlock() at rms_rlock+0x1c
> > zfs_clone_range() at zfs_clone_range+0x68
> > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> > null_bypass() at null_bypass+0x118
> > vn_copy_file_range() at vn_copy_file_range+0x18c
> > kern_copy_file_range() at kern_copy_file_range+0x36c
> > sys_copy_file_range() at sys_copy_file_range+0x8c
> > do_el0_sync() at do_el0_sync+0x634
> > handle_el0_sync() at handle_el0_sync+0x48
> > --- exception, esr 0x56000000
> > KDB: enter: panic
> > [ thread pid 3792 tid 100394 ]
> > Stopped at      kdb_enter+0x48: str     xzr, [x19, #768]
> > db>
> >
> > I'll keep the debugger open for a while. Can I type something for
> > additional info?
> >
> > Regards,
> > Ronald.
>
> --
> Alexander Motin





Hi,

Build a new kernel today.
FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov 14 
12:16:28 CET 2023     
ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG
 arm64

x0: 0x00000000000008e0 x1: 0xffffa0006ce1fb38 x2: 0xffffa0006837a400 x3: 0xffffa0012c503a48 x4: 0xffff0000eb0ef430 (next_index + 0x815d790) x5: 0xffffa00152636300 x6: 0xffff0000e2e025b5 ($d.5 + 0xc) x7: 0x000000000000030a x8: 0xffff0000eb3212c0 (next_index + 0x838f620) x9: 0x0000000000000001 x10: 0x0000000000000000 x11: 0x0000000000000001 x12: 0x0000000000000002 x13: 0x0000000000000000 x14: 0x0000000000000001 x15: 0x0000000000000000
x16: 0xffff0000016e5b58 (__stop_set_modmetadata_set + 0x1328)
x17: 0xffff0000004e0c28 (rms_rlock + 0x0)
x18: 0xffff0000eb0ef250 (next_index + 0x815d5b0)
x19: 0x0000000000000800
x20: 0xffff0000eb0ef430 (next_index + 0x815d790)
x21: 0x7fffffffffffffff
x22: 0xffffa0006ce1fb38
x23: 0xffffa0006837a400
x24: 0xffffa001ee486000
x25: 0x00000000000008e0
x26: 0xffffa000135ca000
x27: 0x0000000000000800
x28: 0xffffa000135ca000
x29: 0xffff0000eb0ef250 (next_index + 0x815d5b0)
 sp: 0xffff0000eb0ef250
 lr: 0xffff00000162bee8 (zfs_clone_range + 0x6c)
elr: 0xffff0000004e0c44 (rms_rlock + 0x1c)
spsr: 0x00000000a0000045
far: 0x0000000000000908
esr: 0x0000000096000004
panic: data abort in critical section or under mutex
cpuid = 2
time = 1699966486
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a0
panic() at panic+0x48
data_abort() at data_abort+0x2fc
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x96000004
rms_rlock() at rms_rlock+0x1c
zfs_clone_range() at zfs_clone_range+0x68
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
null_bypass() at null_bypass+0x118
vn_copy_file_range() at vn_copy_file_range+0x1c0
kern_copy_file_range() at kern_copy_file_range+0x36c
sys_copy_file_range() at sys_copy_file_range+0x8c
do_el0_sync() at do_el0_sync+0x634
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000
KDB: enter: panic
[ thread pid 3620 tid 100911 ]
Stopped at      kdb_enter+0x48: str     xzr, [x19, #768]
db>

This happens as soon as I start poudriere in a jenkins-agent jail.

AFAIK this includes the two recent vn_copy_file_range changes of Konstantin.

Next I will install a GENERIC kernel instead of GENERIC-NODEBUG.

Regards,
Ronald.

Reply via email to