2023-03-20 at 17:56, Peter Rosin wrote: > Dear Maintainer, > > After crashing the kernel with "echo c > /proc/sysrq-trigger" the > watchdoggery sometimes failes to trigger a reboot. It's as if the > watchdog daemon continues to successfully perform its checks and > thus continues to service the hardware watchdog even if the > kernel has paniced. > > The watchdog configuration is trivial: > > watchdog-device = /dev/watchdog > interval = 10 > realtime = yes > priority = 1 > pidfile = /run/foo.pid > pidfile = /run/bar.pid > > When reading the manual I noticed this passage: > > "watchdog will try periodically to fork itself to > see whether the process table is full." > > Since I was a bit sceptic that a paniced kernel could > successfully fork, I wondered a bit about what "periodically" > meant. So I went digging to see exactly how often that fork > test is performed and how long a should expect to wait for it, > but it appears it is no longer done at all. > > To verify, I added an empty script that simply returns 0 to > /etc/watchdog.d and after that, the watchdog kicks in as expected. > That's arguably heavier than a fork-exit-test, but still an > indication. > > I then went digging in the git history to check if it might be > intentional, but it appears not. The way I read it, the check > went missing along with 12-year-old commit > 0fc6d009c78f ("This patch allows zero or more scripts/programs...") > which was new for version 5.10. > > Notice how the "if (tbinary == NULL)" test is moved to before the > fork() call in the check_bin() function in that patch. But maybe > I misread something? > > Anyway, please repair the broken fork test (or adjust the manual > to the new reality.)
Patch and upstream merge request created that restores the fork test: https://sourceforge.net/p/watchdog/code/merge-requests/4/ Cheers, Peter