2023-03-20 at 17:56, Peter Rosin wrote:
> Dear Maintainer,
> 
> After crashing the kernel with "echo c > /proc/sysrq-trigger" the
> watchdoggery sometimes failes to trigger a reboot. It's as if the
> watchdog daemon continues to successfully perform its checks and
> thus continues to service the hardware watchdog even if the
> kernel has paniced.
> 
> The watchdog configuration is trivial:
> 
> watchdog-device = /dev/watchdog
> interval = 10
> realtime = yes
> priority = 1
> pidfile = /run/foo.pid
> pidfile = /run/bar.pid
> 
> When reading the manual I noticed this passage:
> 
>         "watchdog will try periodically to fork itself to
>         see whether the process table is full."
> 
> Since I was a bit sceptic that a paniced kernel could
> successfully fork, I wondered a bit about what "periodically"
> meant. So I went digging to see exactly how often that fork
> test is performed and how long a should expect to wait for it,
> but it appears it is no longer done at all.
> 
> To verify, I added an empty script that simply returns 0 to
> /etc/watchdog.d and after that, the watchdog kicks in as expected.
> That's arguably heavier than a fork-exit-test, but still an
> indication.
> 
> I then went digging in the git history to check if it might be
> intentional, but it appears not. The way I read it, the check
> went missing along with 12-year-old commit
> 0fc6d009c78f ("This patch allows zero or more scripts/programs...")
> which was new for version 5.10.
> 
> Notice how the "if (tbinary == NULL)" test is moved to before the
> fork() call in the check_bin() function in that patch. But maybe
> I misread something?
> 
> Anyway, please repair the broken fork test (or adjust the manual
> to the new reality.)

Patch and upstream merge request created that restores the fork test:
https://sourceforge.net/p/watchdog/code/merge-requests/4/

Cheers,
Peter

Reply via email to