Well, dang, I had already fixed this a year and a half ago. I wish I
had a better memory.
Anyway, the fix is commit db05ddf7f321634c5659a0cf7ea56594e22365f7
("ipmi:watchdog: Set panic count to proper value on a panic") in
mainstream 5.16. I'm attaching that patch.
-corey
On Tue, Mar 14, 2023 at 03:58:26PM +0100, Christian Theune via
Openipmi-developer wrote:
> Awesome!
>
> > On 14. Mar 2023, at 15:54, Corey Minyard <[email protected]> wrote:
> >
> > On Tue, Mar 14, 2023 at 03:22:39PM +0100, Christian Theune via
> > Openipmi-developer wrote:
> >> Hi,
> >>
> >> sorry, I didn’t expect you to make me a branch. I had already taken your
> >> diff over to 5.10 as it applied cleanly … sorry for the additional work
> >> and thanks anyways.
> >
> > Ok, that's great. It's something about the IPMI watchdog panic
> > routines, and I can reproduce. I should be able to fix this pretty
> > quickly. I'll send a patch when I get this fixed.
> >
> > Thanks,
> >
> > -corey
> >
> >>
> >> Here’s the output:
> >>
> >> [ 6521.905890] sysrq: Trigger a crash
> >> [ 6521.909294] Kernel panic - not syncing: sysrq triggered crash
> >> [ 6521.915026] CPU: 1 PID: 43785 Comm: bash Tainted: G I
> >> 5.10.159 #1-NixOS
> >> [ 6521.922925] Hardware name: Dell Inc. PowerEdge R510/00HDP0, BIOS 1.11.0
> >> 07/23/2012
> >> [ 6521.930475] Call Trace:
> >> [ 6521.932923] dump_stack+0x6b/0x83
> >> [ 6521.936230] panic+0x101/0x2c8
> >> [ 6521.939276] ? printk+0x58/0x73
> >> [ 6521.942408] sysrq_handle_crash+0x16/0x20
> >> [ 6521.946407] __handle_sysrq.cold+0x43/0x11a
> >> [ 6521.950580] write_sysrq_trigger+0x24/0x40
> >> [ 6521.954668] proc_reg_write+0x51/0x90
> >> [ 6521.958322] vfs_write+0xc3/0x280
> >> [ 6521.961627] ksys_write+0x5f/0xe0
> >> [ 6521.964935] do_syscall_64+0x33/0x40
> >> [ 6521.968502] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >> [ 6521.973540] RIP: 0033:0x7f2c6b91a133
> >> [ 6521.977106] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f
> >> 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> >> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 41 54 49 89 d4 55 48 89 f5
> >> [ 6521.995836] RSP: 002b:00007ffc4cf11088 EFLAGS: 00000246 ORIG_RAX:
> >> 0000000000000001
> >> [ 6522.003387] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> >> 00007f2c6b91a133
> >> [ 6522.010505] RDX: 0000000000000002 RSI: 0000000001555c08 RDI:
> >> 0000000000000001
> >> [ 6522.017623] RBP: 0000000001555c08 R08: 000000000000000a R09:
> >> 00007f2c6b9aaf40
> >> [ 6522.024743] R10: 00000000016e4218 R11: 0000000000000246 R12:
> >> 0000000000000002
> >> [ 6522.031864] R13: 00007f2c6b9e8520 R14: 00007f2c6b9e8720 R15:
> >> 0000000000000002
> >> [ 6522.039085] Calling notifier panic_event+0x0/0x410 [ipmi_msghandler]
> >> (000000008eb8cb44)
> >> [ 6522.047071] IPMI message handler: IPMI: panic event handler
> >> [ 6522.052628] IPMI message handler: IPMI: handling panic event for intf
> >> 0: 00000000443777b3 0000000067d05ff8
> >> …
> >> and then it reboots after the 255 seconds from the watchdog timer are
> >> passed.
> >>
> >> Christian
> >>
> >>> On 13. Mar 2023, at 18:13, Corey Minyard <[email protected]> wrote:
> >>>
> >>> On Mon, Mar 13, 2023 at 05:42:39PM +0100, Christian Theune wrote:
> >>>> Hrghs. I’m applying your patch to 5.10 as my distro build infrastructure
> >>>> has some patches that don’t apply to 6.2 and that I don’t know how to
> >>>> circumvent quickly enough… :)
> >>>
> >>> Ok, there's a
> >>>
> >>> https://github.com/cminyard/linux-ipmi.git:debug-panic-oem-events-5.10
> >>>
> >>> branch available for you to pull. It's on top of latest 5.10.
> >>>
> >>> -corey
> >>>
> >>>>
> >>>>> On 13. Mar 2023, at 16:59, Christian Theune <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>> I should be easily able to run 6.2, no worries.
> >>>>>
> >>>>>
> >>>>>> On 13. Mar 2023, at 16:33, Corey Minyard <[email protected]> wrote:
> >>>>>>
> >>>>>> On Mon, Mar 13, 2023 at 02:07:01PM +0100, Christian Theune wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> yeah, the IPMI log is fine. This is a 10 minute interval job in our
> >>>>>>> system that exports the log and clears it:
> >>>>>>>
> >>>>>>> The job looks like this:
> >>>>>>>
> >>>>>>> /nix/store/m7lb36dr93qj27r9vskmjihz8imywy86-ipmitool-1.8.18/bin/ipmitool
> >>>>>>> sel elist
> >>>>>>> /nix/store/m7lb36dr93qj27r9vskmjihz8imywy86-ipmitool-1.8.18/bin/ipmitool
> >>>>>>> sel clear
> >>>>>>>
> >>>>>>> So it’s not atomic but it runs after the boot and the elist should
> >>>>>>> output it properly … at least it did in the past. ;)
> >>>>>>>
> >>>>>>> As I said - I’m happy to run any patches you have. If you point me to
> >>>>>>> a git branch somewhere I can switch that system easily.
> >>>>>>
> >>>>>> Ok, I have a branch at
> >>>>>>
> >>>>>> https://github.com/cminyard/linux-ipmi.git:debug-panic-oem-events
> >>>>>>
> >>>>>> that has debug tracing. It will print the function for all panic event
> >>>>>> handlers, their return values, and adds traces in the IPMI panic event
> >>>>>> handlers.
> >>>>>>
> >>>>>> It's a single patch right on top of 6.2; I'm not sure how portable it
> >>>>>> is
> >>>>>> to other kernel versions. I can port if you like.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> -corey
> >>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Christian
> >>>>>>>
> >>>>>>>>> On 13. Mar 2023, at 13:58, Corey Minyard <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> On Mon, Mar 13, 2023 at 10:27:51AM +0100, Christian Theune wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> alright, so here’s the output from the NixOS machine:
> >>>>>>>>>
> >>>>>>>>> root@xxx ~ # echo c >/proc/sysrq-trigger
> >>>>>>>>> client_loop: send disconnect: Broken pipe
> >>>>>>>>> …
> >>>>>>>>>
> >>>>>>>>> root@xxx ~ # journalctl -u ipmi-log.service
> >>>>>>>>> -- Journal begins at Sun 2023-02-26 14:25:36 CET, ends at Mon
> >>>>>>>>> 2023-03-13 10:25:27 CET. --
> >>>>>>>>> Mar 13 10:12:38 xxx ipmi-log-start[520973]: Clearing SEL. Please
> >>>>>>>>> allow a few seconds to erase.
> >>>>>>>>> ...
> >>>>>>>>> -- Boot fdef496e784e4541abd9ae40df472a0b --
> >>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1973]: 1 | 03/13/2023 |
> >>>>>>>>> 09:12:49 | Event Logging Disabled SEL | Log area reset/cleared |
> >>>>>>>>> Asserted
> >>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1973]: 2 | 03/13/2023 |
> >>>>>>>>> 09:21:06 | Watchdog2 OS Watchdog | Hard reset | Asserted
> >>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1977]: Clearing SEL. Please
> >>>>>>>>> allow a few seconds to erase.
> >>>>>>>>
> >>>>>>>> Hmm, the SEL got cleared. That would clear out any of the logs that
> >>>>>>>> were issued before that time. I'm not sure when the above happened
> >>>>>>>> verses the crash, though. It looks like it occurred as part of the
> >>>>>>>> reboot, but I'm not sure what I'm seeing. Maybe you have a startup
> >>>>>>>> process that clears the SEL?
> >>>>>>>>
> >>>>>>>> Assuming that's not the issue, what you have looks ok. I'd need to
> >>>>>>>> add
> >>>>>>>> some logs to the kernel to see if the log operation ever happens.
> >>>>>>>>
> >>>>>>>> -corey
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The SOL log looks like this:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [1107585.917689] sysrq: Trigger a crash
> >>>>>>>>> [1107585.921272] Kernel panic - not syncing: sysrq triggered crash
> >>>>>>>>> [1107585.927178] CPU: 1 PID: 521033 Comm: bash Tainted: G
> >>>>>>>>> I 5.10.159 #1-NixOS
> >>>>>>>>> [1107585.935335] Hardware name: Dell Inc. PowerEdge R510/00HDP0,
> >>>>>>>>> BIOS 1.11.0 07/23/2012
> >>>>>>>>> [1107585.943058] Call Trace:
> >>>>>>>>> [1107585.945680] dump_stack+0x6b/0x83
> >>>>>>>>> [1107585.949158] panic+0x101/0x2c8
> >>>>>>>>> [1107585.952379] ? printk+0x58/0x73
> >>>>>>>>> [1107585.955687] sysrq_handle_crash+0x16/0x20
> >>>>>>>>> [1107585.959859] __handle_sysrq.cold+0x43/0x11a
> >>>>>>>>> [1107585.964203] write_sysrq_trigger+0x24/0x40
> >>>>>>>>> [1107585.968463] proc_reg_write+0x51/0x90
> >>>>>>>>> [1107585.972290] vfs_write+0xc3/0x280
> >>>>>>>>> [1107585.975768] ksys_write+0x5f/0xe0
> >>>>>>>>> [1107585.979248] do_syscall_64+0x33/0x40
> >>>>>>>>> [1107585.982987] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >>>>>>>>> [1107585.988199] RIP: 0033:0x7f5873932133
> >>>>>>>>> [1107585.991938] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> >>>>>>>>> b3 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01
> >>>>>>>>> 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 41 54 49 89
> >>>>>>>>> d4 55 48 89 f5
> >>>>>>>>> [1107586.010842] RSP: 002b:00007ffcc13808c8 EFLAGS: 00000246
> >>>>>>>>> ORIG_RAX: 0000000000000001
> >>>>>>>>> [1107586.018566] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> >>>>>>>>> 00007f5873932133
> >>>>>>>>> [1107586.025923] RDX: 0000000000000002 RSI: 00000000005c1c08 RDI:
> >>>>>>>>> 0000000000000001
> >>>>>>>>> [1107586.033213] RBP: 00000000005c1c08 R08: 000000000000000a R09:
> >>>>>>>>> 00007f58739c2f40
> >>>>>>>>> [1107586.040504] R10: 00000000005cc348 R11: 0000000000000246 R12:
> >>>>>>>>> 0000000000000002
> >>>>>>>>> [1107586.047794] R13: 00007f5873a00520 R14: 00007f5873a00720 R15:
> >>>>>>>>> 0000000000000002
> >>>>>>>>>
> >>>>>>>>> Nothing obvious to me here … if you have any further ideas what to
> >>>>>>>>> test, let me know. I should be more responsive again now.
> >>>>>>>>>
> >>>>>>>>> Thanks and kind regards,
> >>>>>>>>> Christian
> >>>>>>>>>
> >>>>>>>>>> On 5. Mar 2023, at 23:53, Corey Minyard <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Mar 01, 2023 at 06:00:07PM +0100, Christian Theune wrote:
> >>>>>>>>>>> I’m going to actually attach a serial console to watch the “echo
> >>>>>>>>>>> c” panic, maybe that gives _some_ indication.
> >>>>>>>>>>>
> >>>>>>>>>>> Otherwise: I can quickly run patches on the kernel there to try
> >>>>>>>>>>> out things. (And the funding offer still stands.)
> >>>>>>>>>>
> >>>>>>>>>> Any news on this? I'm curious what this could be.
> >>>>>>>>>>
> >>>>>>>>>> -corey
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Christian
> >>>>>>>>>>>
> >>>>>>>>>>>> On 1. Mar 2023, at 17:58, Corey Minyard <[email protected]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Feb 28, 2023 at 06:36:17PM +0100, Christian Theune wrote:
> >>>>>>>>>>>>> Thanks, both machines report:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> # cat /sys/module/ipmi_msghandler/parameters/panic_op
> >>>>>>>>>>>>> string
> >>>>>>>>>>>>
> >>>>>>>>>>>> At this point, I have no idea. I'd have to start adding printks
> >>>>>>>>>>>> into
> >>>>>>>>>>>> the code and cause crashes to see what is happing.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Maybe something is getting in the way of the panic notifiers and
> >>>>>>>>>>>> doing
> >>>>>>>>>>>> something to prevent the IPMI driver from working.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -corey
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 28. Feb 2023, at 18:04, Corey Minyard <[email protected]>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Oh, I forgot. You can look at panic_op in
> >>>>>>>>>>>>>> /sys/module/ipmi_msghandler/parameters/panic_op
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -corey
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 05:48:07PM +0100, Christian Theune via
> >>>>>>>>>>>>>> Openipmi-developer wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 28. Feb 2023, at 17:36, Corey Minyard <[email protected]>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 02:53:12PM +0100, Christian Theune
> >>>>>>>>>>>>>>>> via Openipmi-developer wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I’ve been trying to debug the PANIC and OEM string handling
> >>>>>>>>>>>>>>>>> and am running out of ideas whether this is a bug or
> >>>>>>>>>>>>>>>>> whether something so subtle has changed in my config that
> >>>>>>>>>>>>>>>>> I’m just not seeing it.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> (Note: I’m willing to pay for consulting.)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Probably not necessary.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks! The offer always stands. If we should ever meet I’m
> >>>>>>>>>>>>>>> also able to pay in beverages. ;)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have machines that we’ve moved from an older setup
> >>>>>>>>>>>>>>>>> (Gentoo, (mostly) vanilla kernel 4.19.157) to a newer setup
> >>>>>>>>>>>>>>>>> (NixOS, (mostly) vanilla kernel 5.10.159) and I’m now
> >>>>>>>>>>>>>>>>> experiencing crashes that seem to be kernel panics but do
> >>>>>>>>>>>>>>>>> not get the usual messages in the IPMI SEL.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I just tested on stock 5.10.159 and it worked without issue.
> >>>>>>>>>>>>>>>> Everything
> >>>>>>>>>>>>>>>> you have below looks ok.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Can you test by causing a crash with:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> echo c >/proc/sysrq-trigger
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> and see if it works?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Yeah, already tried that and unfortunately that _doesn’t_
> >>>>>>>>>>>>>>> work.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> It sounds like you are having some type of crash that you
> >>>>>>>>>>>>>>>> would normally
> >>>>>>>>>>>>>>>> use the IPMI logs to debug. However, they aren't perfect,
> >>>>>>>>>>>>>>>> the system
> >>>>>>>>>>>>>>>> has to stay up long enough to get them into the event log.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think they are staying up long enough because a panic
> >>>>>>>>>>>>>>> triggers the 255 second bump in the watchdog and only then
> >>>>>>>>>>>>>>> pass on. However, i’ve also noticed that the kernel _should_
> >>>>>>>>>>>>>>> be rebooting after a panic much faster (and not rely on the
> >>>>>>>>>>>>>>> watchdog) and that doesn’t happen either. (Sorry this just
> >>>>>>>>>>>>>>> popped from the back of my head).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In this situation, getting a serial console (probably
> >>>>>>>>>>>>>>>> through IPMI
> >>>>>>>>>>>>>>>> Serial over LAN) and getting the console output on a crash
> >>>>>>>>>>>>>>>> is probably
> >>>>>>>>>>>>>>>> your best option. You can use ipmitool for this, or I have
> >>>>>>>>>>>>>>>> a library
> >>>>>>>>>>>>>>>> that is able to make connections to serial ports, including
> >>>>>>>>>>>>>>>> through IPMI
> >>>>>>>>>>>>>>>> SoL.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Yup. Been there, too. :)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Unfortunately we’re currently chasing something that pops up
> >>>>>>>>>>>>>>> very randomly on somewhat odd machines and I also have the
> >>>>>>>>>>>>>>> feeling that it’s systematically broken right now (as the
> >>>>>>>>>>>>>>> “echo c” doesn’t work).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks a lot,
> >>>>>>>>>>>>>>> Christian
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Christian Theune · [email protected] · +49 345 219401 0
> >>>>>>>>>>>>>>> Flying Circus Internet Operations GmbH ·
> >>>>>>>>>>>>>>> https://flyingcircus.io
> >>>>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune,
> >>>>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>> Openipmi-developer mailing list
> >>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>>>> Christian Theune
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Christian Theune · [email protected] · +49 345 219401 0
> >>>>>>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune,
> >>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>> Christian Theune
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Christian Theune · [email protected] · +49 345 219401 0
> >>>>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune,
> >>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Liebe Grüße,
> >>>>>>>>> Christian Theune
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Christian Theune · [email protected] · +49 345 219401 0
> >>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> >>>>>>>>> Zagrodnick
> >>>>>>>
> >>>>>>>
> >>>>>>> Liebe Grüße,
> >>>>>>> Christian Theune
> >>>>>>>
> >>>>>>> --
> >>>>>>> Christian Theune · [email protected] · +49 345 219401 0
> >>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> >>>>>>> Zagrodnick
> >>>>>>>
> >>>>
> >>>> Liebe Grüße,
> >>>> Christian Theune
> >>>>
> >>>> --
> >>>> Christian Theune · [email protected] · +49 345 219401 0
> >>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> >>>> Zagrodnick
> >>>>
> >>
> >> Liebe Grüße,
> >> Christian Theune
> >>
> >> --
> >> Christian Theune · [email protected] · +49 345 219401 0
> >> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> >> Zagrodnick
> >>
> >>
> >>
> >> _______________________________________________
> >> Openipmi-developer mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
>
> Liebe Grüße,
> Christian Theune
>
> --
> Christian Theune · [email protected] · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
>
>
>
> _______________________________________________
> Openipmi-developer mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
>From db05ddf7f321634c5659a0cf7ea56594e22365f7 Mon Sep 17 00:00:00 2001
From: Corey Minyard <[email protected]>
Date: Mon, 20 Sep 2021 06:25:37 -0500
Subject: [PATCH] ipmi:watchdog: Set panic count to proper value on a panic
You will get two decrements when the messages on a panic are sent, not
one, since commit 2033f6858970 ("ipmi: Free receive messages when in an
oops") was added, but the watchdog code had a bug where it didn't set
the value properly.
Reported-by: Anton Lundin <[email protected]>
Cc: <[email protected]> # v5.4+
Fixes: 2033f6858970 ("ipmi: Free receive messages when in an oops")
Signed-off-by: Corey Minyard <[email protected]>
---
drivers/char/ipmi/ipmi_watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index e4ff3b50de7f..f855a9665c28 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -497,7 +497,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_inc(&panic_done_count);
+ atomic_add(2, &panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -507,7 +507,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_dec(&panic_done_count);
+ atomic_sub(2, &panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -531,12 +531,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_inc(&panic_done_count);
+ atomic_add(2, &panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_dec(&panic_done_count);
+ atomic_sub(2, &panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
--
2.34.1
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer