On So, 04.02.24 00:06, David Timber (d...@dev.snart.me) wrote:

> Systemd crashed on me the other day. I was writing up some Systemd units and
> testing them out by daemon-reload every time I wanted to test them out. Not
> the best way to go on about, I know. My bad abusing Systemd to the point of
> crashing. Perhaps it was just a bit flip that caused this.
>
>    systemd[2368]: Assertion 'path_is_absolute(p)' failed at
>    src/basic/chase.c:628, function chase(). Aborting.
>    systemd[1]: Assertion 'path_is_absolute(p)' failed at
>    src/basic/chase.c:628, function chase(). Aborting.
>    systemd[1]: Caught <ABRT> from our own process.
>    systemd-coredump[32497]: Due to PID 1 having crashed coredump
>    collection will now be turned off.
>    systemd-coredump[32497]: [🡕] Process 32496 (systemd) of user 0
>    dumped core.
>    systemd[1]: Caught <ABRT>, dumped core as pid 32496.
>    systemd[1]: Freezing execution.
>
>    ...
>
>    systemd-journald[871]: Failed to send stream file descriptor to
>    service manager: Transport endpoint is not connected
>
> I didn't even bother trying producing stack trace. I can get on that if
> anyone wants it. My machine started doing some weird things like
> Firefox not

If this is a current systemd version (v255), please generate a stack trace
and submit it as github issue to us, we'll look into it. If it's
older, please report to your distro first.

> being able to do Ajax properly whilst being able to go to a new page,
> Chromium not being able to create a new tab whilst all the text editors
> worked just fine, all the systemctl commands timing out. So basically, I was
> using Linux without fork(). Anyway.
> Well, I think any software can crash for any reason whatsoever. The
> problem

Yeah, an assert like the above is an error we need to fix in systemd.

> with Systemd I realised from this incident is that I had no way of knowing
> that Systemd had crashed until I opened up the journal and kernel logs and
> saw that Systemd had crashed some time ago. In this particular incident,
> Systemd caught the signal and decided to just freeze. No idea why you'd want
> that because if it had just crashed, the kernel would have just panicked and
> I would have realised something went wrong.
>
> 1: So I decided that I need a some sort of "watchdog" that warns me when
> something like this happens. Using dbus to poll the status of the Systemd
> process, it could be a GUI app running under a seat, just a daemon that
> writes a warning message using `wall` or just send mail using a primed up
> MUA process. I wonder if someone already had the same idea and went on to
> make one.

you can just use the usual hw watchdog. If pid1 dies it will not ping
the hw watchdog, and thus a reset is triggered automatically. In fact
we actually configure the hw watchdog by default these days on hw that
has it (which are most PCs).

> 2: How do I get Systemd to freeze to test such program? I mean, if I kill
> Systemd, the kernel would crash so I have to somehow tell Systemd to freeze?

Not really, the kernel blocks SIGSTOP for PID1.

Lennart

--
Lennart Poettering, Berlin

Reply via email to