https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289017

--- Comment #7 from Gui-Dong Han <[email protected]> ---
(In reply to Zhenlei Huang from comment #1)

I can reliably reproduce the panic on an unmodified GENERIC kernel within
seconds using the scripts provided.

However, by inserting artificial delays to widen the race window, I captured
the specific stack trace below.

Crash log:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x40
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff82825bdd
stack pointer           = 0x28:0xfffffe0068fc58c0
frame pointer           = 0x28:0xfffffe0068fc58d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
[TOCTOU_DEBUG] SIOCSLAGG: Change!
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 860 (poc)
rdi: fffff8000363c000 rsi: 00000000d1f2023f rdx: 00000000000000ff
rcx: fffffe0068fc58ec  r8: 0000000000000800  r9: 0000000000000008
rax: 0000000000000000 rbx: fffff80004a08300 rbp: fffffe0068fc58d0
r10: fffffe0068fc5800 r11: 00fff58b8d9e8b8c r12: 000000000000000e
[TOCTOU_DEBUG] SIOCSLAGG: Change!
r13: 0000000000000008 r14: fffff8000363c000 r15: fffff80003624800
trap number             = 12
panic: page fault
cpuid = 2
time = 1765735007
KDB: stack backtrace:
#0 0xffffffff80ba8f1d at kdb_backtrace+0x5d
#1 0xffffffff80b5aa11 at vpanic+0x161
#2 0xffffffff80b5a8a3 at panic+0x43
#3 0xffffffff8104dbfa at trap_pfault+0x3da
#4 0xffffffff81023e88 at calltrap+0x8
#5 0xffffffff82821f7a at lagg_lacp_start+0x1a
#6 0xffffffff8281fa25 at lagg_transmit_ethernet+0xb5
#7 0xffffffff80c85c5c at ether_output_frame+0xcc
#8 0xffffffff80c85a50 at ether_output+0x6b0
#9 0xffffffff80d21a48 at ip_output+0x13a8
#10 0xffffffff80d52cf0 at udp_send+0xb60
#11 0xffffffff80c0145c at sosend_dgram+0x31c
#12 0xffffffff80c0242f at sousrsend+0x5f
#13 0xffffffff80c0aec0 at kern_sendit+0x1c0
#14 0xffffffff80c0b1f2 at sendit+0x1b2
#15 0xffffffff80c0b02d at sys_sendto+0x4d
#16 0xffffffff8104e547 at amd64_syscall+0x117
#17 0xffffffff8102479b at fast_syscall_common+0xf8

This crash indicates that lagg_lacp_start was executing after the protocol
resources had already been cleared by the detach routine.

This confirms a severe lack of synchronization between the data path and the
control path, which can lead to various race conditions.

I strongly recommend validating any proposed fix by running the attached
stress-test scripts for an extended period.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to