Re: Commit 367705+367706 causes a pabic

Kristof Provost Fri, 20 Nov 2020 03:53:53 -0800

Can you share your kernel config file (and src.conf / make.conf if theyexist)?

This second panic is in the IPSec code. My current thinking is that yourkernel config is triggering a bug that’s manifesting in multipleplaces, but not actually caused by those places.


I’d like to be able to reproduce it so we can debug it.

Best regards,
Kristof

On 20 Nov 2020, at 12:02, Peter Blok wrote:

Hi Kristof,
This is 12-stable. With the previous bridge epochification that wasbacked out my config had a panic too.
I don’t have any local modifications. I did a clean rebuild afterremoving /usr/obj/usr
My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko andnmdm.ko as modules. Everything else is statically linked. I haveremoved all drivers not needed for the hardware at hand.
My bridge is between two vlans from the same trunk and the jail epairdevices as well as the bhyve tap devices.
The panic happens when the jails are starting.
I can try to narrow it down over the weekend and make the crash dumpavailable for analysis.
Previously I had the following crash with 363492

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0xffffffff00000410
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80692326
stack pointer           = 0x28:0xfffffe00c06097b0
frame pointer           = 0x28:0xfffffe00c06097f0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 2030 (ifconfig)
trap number             = 12
panic: page fault
cpuid = 2
time = 1595683412
KDB: stack backtrace:
#0 0xffffffff80698165 at kdb_backtrace+0x65
#1 0xffffffff8064d67b at vpanic+0x17b
#2 0xffffffff8064d4f3 at panic+0x43
#3 0xffffffff809cc311 at trap_fatal+0x391
#4 0xffffffff809cc36f at trap_pfault+0x4f
#5 0xffffffff809cb9b6 at trap+0x286
#6 0xffffffff809a5b28 at calltrap+0x8
#7 0xffffffff803677fd at ck_epoch_synchronize_wait+0x8d
#8 0xffffffff8069213a at epoch_wait_preempt+0xaa
#9 0xffffffff807615b7 at ipsec_ioctl+0x3a7
#10 0xffffffff8075274f at ifioctl+0x47f
#11 0xffffffff806b5ea7 at kern_ioctl+0x2b7
#12 0xffffffff806b5b4a at sys_ioctl+0xfa
#13 0xffffffff809ccec7 at amd64_syscall+0x387
#14 0xffffffff809a6450 at fast_syscall_common+0x101
On 20 Nov 2020, at 11:30, Kristof Provost <k...@freebsd.org> wrote:
On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org<mailto:peter.b...@bsd4all.org> wrote:
I’m afraid the last Epoch fix for bridge is not solving theproblem ( or perhaps creates a new ).
We’re talking about the stable/12 branch, right?
This seems to happen when the jail epair is added to the bridge.
There must be something more to it than that. I’ve run the bridgetests on stable/12 without issue, and this is a problem we didn’tsee when the bridge epochification initially went into stable/12.
Do you have a custom kernel config? Other patches? What exactcommands do you run to trigger the panic?
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80695e76
stack pointer           = 0x28:0xfffffe00bf14e6e0
frame pointer           = 0x28:0xfffffe00bf14e720
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 1686 (jail)
trap number             = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0xffffffff8069bb85 at kdb_backtrace+0x65
#1 0xffffffff80650a4b at vpanic+0x17b
#2 0xffffffff806508c3 at panic+0x43
#3 0xffffffff809d0351 at trap_fatal+0x391
#4 0xffffffff809d03af at trap_pfault+0x4f
#5 0xffffffff809cf9f6 at trap+0x286
#6 0xffffffff809a98c8 at calltrap+0x8
#7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0xffffffff80695c8a at epoch_wait_preempt+0xaa
#9 0xffffffff80757d40 at vnet_if_init+0x120
#10 0xffffffff8078c994 at vnet_alloc+0x114
#11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7
#12 0xffffffff80620190 at sys_jail_set+0x40
#13 0xffffffff809d0f07 at amd64_syscall+0x387
#14 0xffffffff809aa1ee at fast_syscall_common+0xf8
This panic is rather odd. This isn’t even the bridge code. This isduring initial creation of the vnet. I don’t really see how thiscould even trigger panics.That panic looks as if something corrupted the net_epoch_preempt, byoverwriting the epoch->e_epoch. The bridge patches only access thisvariable through the well-established functions and macros. I see noobvious way that they could corrupt it.
Best regards,
Kristof



_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Commit 367705+367706 causes a pabic

Reply via email to