Hi,
I’ve been trying to debug the PANIC and OEM string handling and am running out
of ideas whether this is a bug or whether something so subtle has changed in my
config that I’m just not seeing it.
(Note: I’m willing to pay for consulting.)
I have machines that we’ve moved from an older setup (Gentoo, (mostly) vanilla
kernel 4.19.157) to a newer setup (NixOS, (mostly) vanilla kernel 5.10.159) and
I’m now experiencing crashes that seem to be kernel panics but do not get the
usual messages in the IPMI SEL.
The kernel does include the necessary drivers, the watchdog is active and the
SEL shows the watchdog action. I have reason to think that it’s a panic because
the typical behaviour of the timeout jumping to 255 happens.
Here’s the IPMI-related config and cmdline from the old kernel where it works:
BOOT_IMAGE=/kernel-genkernel-x86_64-4.19.157 root=/dev/vgsys/root ro
rootfstype=ext4 dolvm ipmi_watchdog.timeout=60 igb.InterruptThrottleRate=1
ixgbe.InterruptThrottleRate=1 console=ttyS2,57600
# CONFIG_ACPI_IPMI is not set
CONFIG_IPMI_HANDLER=y
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=y
CONFIG_IPMI_SI=y
# CONFIG_IPMI_SSIF is not set
CONFIG_IPMI_WATCHDOG=y
CONFIG_IPMI_POWEROFF=y
On that system (as everything is statically compiled) the lsmod is empty WRT
ipmi and the kernel log shows:
[ 4.374757] ipmi device interface
[ 4.389388] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[ 4.402087] ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
[ 4.413907] ipmi_si: Adding SMBIOS-specified kcs state machine
[ 4.425570] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[ 4.437408] ipmi_si IPI0001:00: [io 0x0ca2] regsize 1 spacing 1 irq 0
[ 4.450449] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state
machine in favor of ACPI
[ 4.467818] ipmi_si: Adding ACPI-specified kcs state machine
[ 4.479139] ipmi_si: Trying ACPI-specified kcs state machine at i/o address
0xca2, slave address 0x0, irq 0
[ 4.567613] ipmi_si IPI0001:00: The BMC does not support clearing the recv
irq bit, compensating, but the BMC needs to be fixed.
[ 4.617693] ipmi_si IPI0001:00: Found new BMC (man_id: 0x002a7c, prod_id:
0x0624, dev_id: 0x20)
[ 4.671871] ipmi_si IPI0001:00: IPMI kcs interface initialized
And here’s the controller info:
Device ID : 32
Device Revision : 1
Firmware Revision : 2.24
IPMI Version : 2.0
Manufacturer ID : 10876
Manufacturer Name : Supermicro
Product ID : 1572 (0x0624)
Product Name : Unknown (0x624)
Device Available : yes
Provides Device SDRs : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device
Aux Firmware Rev Info :
0x06
0x00
0x00
0x00
And here’s the NixOS environment where it doesn’t work:
BOOT_IMAGE=/kernels/qy42jhicvvqb0p7x2h0i46b2x0f1w74q-linux-5.10.159-bzImage
init=/nix/store/qx33nyr0f60y76yzmbgsikxr60lqzdb3-nixos-system-...-21.05pre-git/init
dolvm ipmi_watchdog.timeout=60 igb.InterruptThrottleRate=1
ixgbe.InterruptThrottleRate=1 panic=1 boot.panic_on_fail
systemd.journald.forward_to_console=no systemd.log_target=kmsg
console=ttyS1,115200 loglevel=7
CONFIG_ACPI_IPMI=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
On the newer system this is what appears in the kernel log:
[ 22.070935] ipmi device interface
[ 22.086353] systemd-modules-load[572]: Inserted module 'ipmi_watchdog'
[ 22.904717] ipmi_si: IPMI System Interface driver
[ 22.911022] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[ 22.917393] ipmi_platform: ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 0
[ 22.925092] ipmi_si: Adding SMBIOS-specified kcs state machine
[ 22.931023] ipmi_si: Trying SMBIOS-specified kcs state machine at i/o
address 0xca8, slave address 0x20, irq 0
[ 23.119892] ipmi_si dmi-ipmi-si.0: IPMI message handler: Found new BMC
(man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
[ 23.438469] ipmi_si dmi-ipmi-si.0: IPMI kcs interface initialized
[ 23.441630] ipmi_ssif: IPMI SSIF Interface driver
And the ipmi-related modules look like this:
ipmi_ssif 40960 0
ipmi_si 73728 1
ipmi_watchdog 32768 1
ipmi_devintf 20480 0
ipmi_msghandler 73728 4 ipmi_devintf,ipmi_si,ipmi_watchdog,ipmi_ssif
i2c_core 102400 5
drm_kms_helper,i2c_algo_bit,mgag200,ipmi_ssif,drm
In this case it’s a DELL IPMI controller:
Device ID : 32
Device Revision : 0
Firmware Revision : 1.52
IPMI Version : 2.0
Manufacturer ID : 674
Manufacturer Name : DELL Inc
Product ID : 256 (0x0100)
Product Name : Unknown (0x100)
Device Available : yes
Provides Device SDRs : yes
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
Bridge
Chassis Device
Aux Firmware Rev Info :
0x00
0x0a
0x00
0x00
But the behaviour has been the same on various SuperMicro machines.
So, after running out of ideas what to look for, I’m left with those questions:
1. when I trigger a panic manually via “echo c > /proc/sysrq-trigger” - that
should also create a panic message that appears in the SEL, right?
2. Is there anything that comes to mind that I could have configured
incorrectly in the kernel?
3. Or is there anything I can inspect after boot to know which setting the
“panic_op” has?
I have running systems with both the old and new setups available that I can
freely poke to analyze it interactively.
Any help is appreciated as I’ve run out of ideas and (just to make sure) I’m
happy to pay proper consulting rates (especially happy to support open source
work).
Liebe Grüße,
Christian Theune
--
Christian Theune · [email protected] · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer