Re: [systemd-devel] Detecting Systemd crash

2024-02-05 Thread Mantas Mikulėnas
On Mon, Feb 5, 2024, 14:54 Lennart Poettering 
wrote:

> On So, 04.02.24 00:06, David Timber (d...@dev.snart.me) wrote:
>
> > 2: How do I get Systemd to freeze to test such program? I mean, if I kill
> > Systemd, the kernel would crash so I have to somehow tell Systemd to
> freeze?
>
> Not really, the kernel blocks SIGSTOP for PID1.
>

Attaching gdb to pid1 should do the job.


Re: [systemd-devel] Detecting Systemd crash

2024-02-05 Thread Lennart Poettering
On Sa, 03.02.24 16:55, Álvaro Cebrián Juan (acebrianj...@gmail.com) wrote:

> Great question!
>
> I am very interested in detecting systemd crashes too since I have
> experienced them recently and have been asked to come up with a solution to
> react when a PID1 crash happens.
> In fact, in my recent experiences, a journald crash was enough to render
> the system into an unreliable/degraded state in which some top-level
> applications worked while others didn't.
>
> So adding to David's 1st question, I need to detect systemd and journald
> crashes and then trigger a `systemctl reboot --force --force`
> command

As mentioned elsewhere in this thread just use RuntimeWatchdogSec= in
systemd-system.conf(5)

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Detecting Systemd crash

2024-02-05 Thread Lennart Poettering
On Mo, 05.02.24 13:54, Lennart Poettering (lenn...@poettering.net) wrote:

> you can just use the usual hw watchdog. If pid1 dies it will not ping
> the hw watchdog, and thus a reset is triggered automatically. In fact
> we actually configure the hw watchdog by default these days on hw that
> has it (which are most PCs).

Actually, we don't really, I need to correct myself. We probably
should though, dunno.

See RuntimeWatchdogSec= in systemd-system.conf(5)

>
> > 2: How do I get Systemd to freeze to test such program? I mean, if I kill
> > Systemd, the kernel would crash so I have to somehow tell Systemd to freeze?
>
> Not really, the kernel blocks SIGSTOP for PID1.
>
> Lennart
>
> --
> Lennart Poettering, Berlin

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Detecting Systemd crash

2024-02-05 Thread Lennart Poettering
On So, 04.02.24 00:06, David Timber (d...@dev.snart.me) wrote:

> Systemd crashed on me the other day. I was writing up some Systemd units and
> testing them out by daemon-reload every time I wanted to test them out. Not
> the best way to go on about, I know. My bad abusing Systemd to the point of
> crashing. Perhaps it was just a bit flip that caused this.
>
>systemd[2368]: Assertion 'path_is_absolute(p)' failed at
>src/basic/chase.c:628, function chase(). Aborting.
>systemd[1]: Assertion 'path_is_absolute(p)' failed at
>src/basic/chase.c:628, function chase(). Aborting.
>systemd[1]: Caught  from our own process.
>systemd-coredump[32497]: Due to PID 1 having crashed coredump
>collection will now be turned off.
>systemd-coredump[32497]: [] Process 32496 (systemd) of user 0
>dumped core.
>systemd[1]: Caught , dumped core as pid 32496.
>systemd[1]: Freezing execution.
>
>...
>
>systemd-journald[871]: Failed to send stream file descriptor to
>service manager: Transport endpoint is not connected
>
> I didn't even bother trying producing stack trace. I can get on that if
> anyone wants it. My machine started doing some weird things like
> Firefox not

If this is a current systemd version (v255), please generate a stack trace
and submit it as github issue to us, we'll look into it. If it's
older, please report to your distro first.

> being able to do Ajax properly whilst being able to go to a new page,
> Chromium not being able to create a new tab whilst all the text editors
> worked just fine, all the systemctl commands timing out. So basically, I was
> using Linux without fork(). Anyway.
> Well, I think any software can crash for any reason whatsoever. The
> problem

Yeah, an assert like the above is an error we need to fix in systemd.

> with Systemd I realised from this incident is that I had no way of knowing
> that Systemd had crashed until I opened up the journal and kernel logs and
> saw that Systemd had crashed some time ago. In this particular incident,
> Systemd caught the signal and decided to just freeze. No idea why you'd want
> that because if it had just crashed, the kernel would have just panicked and
> I would have realised something went wrong.
>
> 1: So I decided that I need a some sort of "watchdog" that warns me when
> something like this happens. Using dbus to poll the status of the Systemd
> process, it could be a GUI app running under a seat, just a daemon that
> writes a warning message using `wall` or just send mail using a primed up
> MUA process. I wonder if someone already had the same idea and went on to
> make one.

you can just use the usual hw watchdog. If pid1 dies it will not ping
the hw watchdog, and thus a reset is triggered automatically. In fact
we actually configure the hw watchdog by default these days on hw that
has it (which are most PCs).

> 2: How do I get Systemd to freeze to test such program? I mean, if I kill
> Systemd, the kernel would crash so I have to somehow tell Systemd to freeze?

Not really, the kernel blocks SIGSTOP for PID1.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Detecting Systemd crash

2024-02-05 Thread František Šumšal




On 2/3/24 16:06, David Timber wrote:

Systemd crashed on me the other day. I was writing up some Systemd units and 
testing them out by daemon-reload every time I wanted to test them out. Not the 
best way to go on about, I know. My bad abusing Systemd to the point of 
crashing. Perhaps it was just a bit flip that caused this.

    systemd[2368]: Assertion 'path_is_absolute(p)' failed at
    src/basic/chase.c:628, function chase(). Aborting.
    systemd[1]: Assertion 'path_is_absolute(p)' failed at
    src/basic/chase.c:628, function chase(). Aborting.
    systemd[1]: Caught  from our own process.
    systemd-coredump[32497]: Due to PID 1 having crashed coredump
    collection will now be turned off.
    systemd-coredump[32497]: [] Process 32496 (systemd) of user 0
    dumped core.
    systemd[1]: Caught , dumped core as pid 32496.
    systemd[1]: Freezing execution.

    ...

    systemd-journald[871]: Failed to send stream file descriptor to
    service manager: Transport endpoint is not connected

I didn't even bother trying producing stack trace. I can get on that if anyone 
wants it.


What you did was perfectly reasonable, systemd shouldn't just crash in that 
case. If you run the recent-ish systemd, a stack trace would be very welcome.


My machine started doing some weird things like Firefox not being able to do 
Ajax properly whilst being able to go to a new page, Chromium not being able to 
create a new tab whilst all the text editors worked just fine, all the 
systemctl commands timing out. So basically, I was using Linux without fork(). 
Anyway.
Well, I think any software can crash for any reason whatsoever. The problem 
with Systemd I realised from this incident is that I had no way of knowing that 
Systemd had crashed until I opened up the journal and kernel logs and saw that 
Systemd had crashed some time ago. In this particular incident, Systemd caught 
the signal and decided to just freeze. No idea why you'd want that because if 
it had just crashed, the kernel would have just panicked and I would have 
realised something went wrong.

1: So I decided that I need a some sort of "watchdog" that warns me when 
something like this happens. Using dbus to poll the status of the Systemd process, it 
could be a GUI app running under a seat, just a daemon that writes a warning message 
using `wall` or just send mail using a primed up MUA process. I wonder if someone already 
had the same idea and went on to make one.

2: How do I get Systemd to freeze to test such program? I mean, if I kill 
Systemd, the kernel would crash so I have to somehow tell Systemd to freeze?


Just trigger systemd's crash handler by sending it a SIGSEGV (kill -SEGV 1).


Re: [systemd-devel] Detecting Systemd crash

2024-02-05 Thread František Šumšal



On 2/3/24 16:55, Álvaro Cebrián Juan wrote:

Great question!

I am very interested in detecting systemd crashes too since I have experienced 
them recently and have been asked to come up with a solution to react when a 
PID1 crash happens.
In fact, in my recent experiences, a journald crash was enough to render the 
system into an unreliable/degraded state in which some top-level applications 
worked while others didn't.

So adding to David's 1st question, I need to detect systemd and journald 
crashes and then trigger a `systemctl reboot --force --force` command


You can tell systemd to do just that, by setting CrashReboot=yes in system.conf 
[0][1]. It defaults to 'no' to avoid reboot loops.

[0] 
https://www.freedesktop.org/software/systemd/man/latest/systemd-system.conf.html#LogColor=
[1] 
https://www.freedesktop.org/software/systemd/man/latest/systemd.html#systemd.crash_reboot



I have also read that Linux Magic System Request Key (SysRq) can help in such 
scenarios but I don't know how they work.

Any help would be very appreciated.
Thank you.

Some related links:
https://news.ycombinator.com/item?id=19023695 

https://news.ycombinator.com/item?id=36873927 

https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html 



El sáb, 3 feb 2024 a las 16:14, David Timber (mailto:d...@dev.snart.me>>) escribió:

Systemd crashed on me the other day. I was writing up some Systemd units
and testing them out by daemon-reload every time I wanted to test them
out. Not the best way to go on about, I know. My bad abusing Systemd to
the point of crashing. Perhaps it was just a bit flip that caused this.

     systemd[2368]: Assertion 'path_is_absolute(p)' failed at
     src/basic/chase.c:628, function chase(). Aborting.
     systemd[1]: Assertion 'path_is_absolute(p)' failed at
     src/basic/chase.c:628, function chase(). Aborting.
     systemd[1]: Caught  from our own process.
     systemd-coredump[32497]: Due to PID 1 having crashed coredump
     collection will now be turned off.
     systemd-coredump[32497]: [] Process 32496 (systemd) of user 0
     dumped core.
     systemd[1]: Caught , dumped core as pid 32496.
     systemd[1]: Freezing execution.

     ...

     systemd-journald[871]: Failed to send stream file descriptor to
     service manager: Transport endpoint is not connected

I didn't even bother trying producing stack trace. I can get on that if
anyone wants it. My machine started doing some weird things like Firefox
not being able to do Ajax properly whilst being able to go to a new
page, Chromium not being able to create a new tab whilst all the text
editors worked just fine, all the systemctl commands timing out. So
basically, I was using Linux without fork(). Anyway.
Well, I think any software can crash for any reason whatsoever. The
problem with Systemd I realised from this incident is that I had no way
of knowing that Systemd had crashed until I opened up the journal and
kernel logs and saw that Systemd had crashed some time ago. In this
particular incident, Systemd caught the signal and decided to just
freeze. No idea why you'd want that because if it had just crashed, the
kernel would have just panicked and I would have realised something went
wrong.

1: So I decided that I need a some sort of "watchdog" that warns me when
something like this happens. Using dbus to poll the status of the
Systemd process, it could be a GUI app running under a seat, just a
daemon that writes a warning message using `wall` or just send mail
using a primed up MUA process. I wonder if someone already had the same
idea and went on to make one.

2: How do I get Systemd to freeze to test such program? I mean, if I
kill Systemd, the kernel would crash so I have to somehow tell Systemd
to freeze?



Re: [systemd-devel] systemd-pcrlock Failed to submit super PCR policy

2024-02-05 Thread Lennart Poettering
On Mo, 05.02.24 09:24, Dominick Grift (dominick.gr...@defensec.nl) wrote:

Please run "SYSTEMD_LOG_LEVEL=debug systemd-pcrlock make-policy" from
the command line, then file a github issue about this, and pastethe
output there.

Lennart

--
Lennart Poettering, Berlin


[systemd-devel] systemd-pcrlock Failed to submit super PCR policy

2024-02-05 Thread Dominick Grift


systemd v255
Debian Testing
Linux nimbus 6.6.13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1
(2024-01-20) x86_64 GNU/Linux
systemd-pcrlock

Feb 04 20:00:02 nimbus audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 
ses=4294967295 subj=sys.id:sys.role:sys.subj:s0 
msg='unit=systemd-pcrlock-make-policy comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 04 20:00:02 nimbus systemd[1]: Failed to start 
systemd-pcrlock-make-policy.service - Make TPM2 PCR Policy.
Feb 04 20:00:02 nimbus systemd[1]: systemd-pcrlock-make-policy.service: Failed 
with result 'exit-code'.
Feb 04 20:00:02 nimbus systemd[1]: systemd-pcrlock-make-policy.service: Main 
process exited, code=exited, status=1/FAILURE
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Failed to submit super PCR 
policy: State not recoverable
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Failed to add OR policy to TPM: 
tpm:parameter(1):value is out of range or is not correct for the context
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: 
ERROR:esys:src/tss2-esys/api/Esys_PolicyOR.c:100:Esys_PolicyOR() Esys Finish 
ErrorCode (0x01c4)
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: 
WARNING:esys:src/tss2-esys/api/Esys_PolicyOR.c:286:Esys_PolicyOR_Finish() 
Received TPM Error
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting OR Branch #1: 
a36d5b482f1c0ff2c57737c7e8c671d88f0bb2cf52140034ec4b67774eb47e87
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting OR Branch #0: 
2cacf1f3ded4eead1044bd14c4e519a4614c6af51a4781a89126834b7830e81b
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting OR policy.
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: PolicyPCR calculated digest: 
a36d5b482f1c0ff2c57737c7e8c671d88f0bb2cf52140034ec4b67774eb47e87
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: PolicyPCR calculated digest: 
2cacf1f3ded4eead1044bd14c4e519a4614c6af51a4781a89126834b7830e81b
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Session policy digest: 
b117275cc6ee990f9c572b80e67a98f133cd092029b450eda445fb1ff2454886
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Acquiring policy digest.
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting PCR hash policy.
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting PCR/OR policy for PCR 
1
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Session policy digest: 
6cc828077856fbe4333c4372ec374df31f6c3a36b2e63b778d2e2ae6b3ef532a
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Acquiring policy digest.
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting OR Branch #1: 
940dbe9fc9a5c4cb73e30e6454b659f8f635ebc0b6d4b327c4f98fad9bc56ccf
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting OR Branch #0: 
eeec8aadd13fef1af29067b499a8e9eeb82215a32a2bc838b5d5e4984c4d7100
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Submitting OR policy.
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: PolicyPCR calculated digest: 
940dbe9fc9a5c4cb73e30e6454b659f8f635ebc0b6d4b327c4f98fad9bc56ccf
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: PolicyPCR calculated digest: 
eeec8aadd13fef1af29067b499a8e9eeb82215a32a2bc838b5d5e4984c4d7100
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Session policy digest: 
eeec8aadd13fef1af29067b499a8e9eeb82215a32a2bc838b5d5e4984c4d7100
Feb 04 20:00:02 nimbus systemd-pcrlock[35974]: Acquiring policy digest.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Submitting PCR hash policy.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Submitting PCR/OR policy for PCR 0
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Session policy digest: 
af31ab03c1d2d596f518acc44424bfa26c777400bc7c4e60f883663512a84988
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Acquiring policy digest.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Submitting PCR hash policy.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Including PCR 14 in single value 
PolicyPCR expression
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Including PCR 13 in single value 
PolicyPCR expression
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Including PCR 12 in single value 
PolicyPCR expression
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Starting policy session.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Retrieving PIN from sealed data.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Starting HMAC encryption session.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Getting TPM2 capability 0x0005 
property 0x count 1.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Getting TPM2 capability 0x0008 
property 0x count 508.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Getting TPM2 capability 0x0002 
property 0x011f count 256.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Getting TPM2 capability 0x 
property 0x0001 count 127.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: TPM successfully started up.
Feb 04 20:00:01 nimbus systemd-pcrlock[35974]: Loaded TCTI module 'tcti-device' 
(TCTI module for communication with Linux kernel interface.) [Version 2]
Feb 04 20:00:01 nimbus