Package: watchdog
Followup-For: Bug #923254
X-Debbugs-Cc: [email protected]

Wanted to share what I'm doing locally to address this, and how I ran into it.
I have not submitted Debian patches before but can try to figure it out if
anyone would like.

There is a way to react differently if the service is restarted versus if it is
being stopped or killed in some other way. We can do that by sending a different
kill signal for restarts. This can be done by removing the existing
`ExecStopPost=` and adding the following:

```
RestartKillSignal=SIGINT
ExecStopPost=/bin/sh -c 'if [ "$SERVICE_RESULT" != "success" ] || [ 
"$EXIT_STATUS" != "INT" ]; then /bin/systemctl start --no-block 
wd_keepalive.service; fi'
```

This directly starts the keepalive service when the watchdog is stopped instead
of relying on `OnFailure=` for that, so it ought to be possible to remove
`/bin/systemctl reset-failed` from wd_keepalive.service, too, as requested in
#835496.

Until fixed, other affected folks can use these changes in override files
(e.g. `systemctl edit watchdog`, adding a `[Service]` section with the lines
above, along with an additional `ExecStopPost=` line to ensure the pre-existing
`ExecStopPost` line is ignored).

For me, I encountered this bug as part of a series of troublesome issues in a
Debian-based distribution. An automatic update caused systemd to reexecute,
which restarted watchdog, which failed as described, causing wd_keepalive to
start and preventing the system from rebooting as I wanted it to when it lost
connectivity due to a separate issue that I was seeing every time systemd
reexecuted. As a result, when the affected server updated, I became unable
to access it again until I had physical access, which was the very problem
I installed watchdog to avoid. (Guess I should have turned on automatic
reboots for updates, too!)

Reply via email to