On 2/24/19 1:23 AM, Andreas Kempe wrote:
Hello,
When running a Mellanox MT26418 in ethernet mode, the kernel crashes
with the following stack trace on system shutdown:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80e3f5f4
stack pointer = 0x28:0xfffffe064abec6e0
frame pointer = 0x28:0xfffffe064abec700
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1 (init)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80b4c5b7 at kdb_backtrace+0x67
#1 0xffffffff80b05b57 at vpanic+0x177
#2 0xffffffff80b059d3 at panic+0x43
#3 0xffffffff8106efdf at trap_fatal+0x35f
#4 0xffffffff8106f039 at trap_pfault+0x49
#5 0xffffffff8106e807 at trap+0x2c7
#6 0xffffffff8104f03c at calltrap+0x8
#7 0xffffffff80e3fae2 at mlx4_en_stop_port+0x3d2
#8 0xffffffff80e40ff6 at mlx4_en_destroy_netdev+0x1e6
#9 0xffffffff80e3e47d at mlx4_en_remove+0xcd
#10 0xffffffff80e1ab01 at mlx4_remove_device+0xb1
#11 0xffffffff80e1b0b8 at mlx4_unregister_device+0x98
#12 0xffffffff80e1c5c5 at mlx4_unload_one+0x85
#13 0xffffffff80e23543 at mlx4_shutdown+0x83
#14 0xffffffff80d6b6e9 at linux_pci_shutdown+0x39
#15 0xffffffff80b4004a at bus_generic_shutdown+0x5a
#16 0xffffffff80b4004a at bus_generic_shutdown+0x5a
#17 0xffffffff80b4004a at bus_generic_shutdown+0x5a
I've traced the issue to the following lines of code in
sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c in mlx4_en_destroy_netdev():
/* Unregister device - this will close the port if it was up */
if (priv->registered) {
mutex_lock(&mdev->state_lock);
ether_ifdetach(dev);
mutex_unlock(&mdev->state_lock);
}>> mutex_lock(&mdev->state_lock);
mlx4_en_stop_port(dev);
mutex_unlock(&mdev->state_lock);
The issue is that mlx4_en_stop_port() follows the fcall chain below and
tries to fetch the MAC address of the device in mlx4_en_put_qp.
mlx4_en_destroy_netdev->mlx4_en_stop_port->mlx4_en_put_qp
The sequence above causes the kernel to choke because the MAC address
was freed in the previous call to ether_ifdetach in if_detach_internal
with the following call chain:
mlx4_en_destroy_netdev->ether_ifdetach->if_detach->if_detach_internal
I've written a small workaround that works on our test machine, although
I suspect this could potentially cause issues as we're destroying the
port before we destroy the interface. Please see the attached patch for
the workaround.
Cordially,
Andreas Kempe
Lysator ACS
CC'ing FreeBSD-drivers at Mellanox.
Thank you for your patch. We'll have a look at it.
--HPS
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[email protected]"