Re: [systemd-devel] systemctl stop going through timeout even though all processes have exited

2023-10-10 Thread Mantas Mikulėnas
Is this with cgroups v1 or v2? If cgroups v1 is involved (thanks Docker), I
recall it was a bit complex for systemd to get notified when the cgroup
actually empties – via /sys/fs/cgroup/systemd/release_agent that specifies
a helper executable that the kernel runs... I wonder if that mechanism is
broken on your system.

On Wed, Oct 11, 2023 at 7:38 AM Martin Schwenke  wrote:

> I'm seeing "systemctl stop " for several services taking a
> long time because it goes through the timeout process, even though all
> relevant processes have exited.
>
> I'll give 2 examples.  Both examples are running inside a privileged
> Rocky Linux 8.8 Docker container on a Rocky Linux 8.8 host.  The
> systemd version, reported by "systemctl --version" in the container
> is:
>
>   systemd 239 (239-74.el8_8.5)
>
> Here is ctdb.system:
>
>   [Unit]
>   Description=CTDB
>   Documentation=man:ctdbd(1) man:ctdb(7)
>   After=network-online.target time-sync.target
>   ConditionFileNotEmpty=/etc/ctdb/nodes
>
>   [Service]
>   Type=forking
>   LimitCORE=infinity
>   LimitNOFILE=1048576
>   TasksMax=4096
>   PIDFile=/var/run/ctdb/ctdbd.pid
>   ExecStart=/usr/sbin/ctdbd
>   ExecStop=/usr/bin/ctdb shutdown
>   KillMode=control-group
>   Restart=no
>
>   [Install]
>   WantedBy=multi-user.target
>
> "/usr/bin/ctdb shutdown" causes a controlled shutdown.  In many cases,
> starting and then stopping using systemctl works fine.  However, many
> times it takes >90s to stop, as per TimeoutStopSec.  If I reduce that
> value then the duration reduces accordingly.  I can confirm using both
> "ps auxfww" and "systemd-cgls" that within the container there are no
> relevant processes a moment after "systemctl stop ctdb" is run.  In
> particular, in systemd-cgls ctdb.service is gone but "systemctl stop
> ctdb" is still waiting.
>
> Before attempting to stop, the service is successfully started:
>
>   Oct 11 00:56:44 rocky1 systemd[710741]: ctdb.service: Executing:
> /usr/sbin/ctdbd
>   Oct 11 00:56:44 rocky1 ctdbd[710741]: CTDB logging to location
> file:/var/log/log.ctdb
>   Oct 11 00:56:44 rocky1 systemd[1]: Received SIGCHLD from PID 710741
> (ctdbd).
>   Oct 11 00:56:44 rocky1 systemd[1]: Child 710741 (ctdbd) died
> (code=exited, status=0/SUCCESS)
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Child 710741 belongs to
> ctdb.service.
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Control process exited,
> code=exited status=0
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Got final SIGCHLD for
> state start.
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: New main PID 710742
> belongs to service, we are happy.
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Main PID loaded: 710742
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Changed start -> running
>   Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Job ctdb.service/start
> finished, result=done
>   Oct 11 00:56:44 rocky1 systemd[1]: Started CTDB.
>   -- Subject: Unit ctdb.service has finished start-up
>   -- Defined-By: systemd
>   -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
>   --
>   -- Unit ctdb.service has finished starting up.
>   --
>   -- The start-up result is done.
>
> The relevant part of the log while stopping seems to be:
>
>   Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710743
> (ctdb-eventd).
>   Oct 11 00:56:47 rocky1 systemd[1]: Child 710742 (ctdbd) died
> (code=exited, status=0/SUCCESS)
>   Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710742 belongs to
> ctdb.service.
>   Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Can't open PID file
> /var/run/ctdb/ctdbd.pid (yet?) after stop: No such file or directory
>   Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Main process exited,
> code=exited, status=0/SUCCESS
>   Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal
> sender=org.freedesktop.systemd1 destination=n/a
> path=/org/freedesktop/systemd1/unit/ctdb_2eservice
> interface=org.freedesktop.DBus.Properties member=PropertiesChanged
> cookie=54 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
>   Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal
> sender=org.freedesktop.systemd1 destination=n/a
> path=/org/freedesktop/systemd1/unit/ctdb_2eservice
> interface=org.freedesktop.DBus.Properties member=PropertiesChanged
> cookie=55 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
>   Oct 11 00:56:47 rocky1 systemd[1]: Child 710743 (ctdb-eventd) died
> (code=exited, status=0/SUCCESS)
>   Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710743 belongs to
> ctdb.service.
>   Oct 11 00:56:47 rocky1 systemd[1]: systemd-journald.service: Received
> EPOLLHUP on stored fd 18 (stored), closing.
>   Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710860
> (ctdb).
>   Oct 11 00:56:47 rocky1 systemd[1]: Child 710860 (ctdb) died
> (code=exited, status=0/SUCCESS)
>   Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710860 belongs 

[systemd-devel] systemctl stop going through timeout even though all processes have exited

2023-10-10 Thread Martin Schwenke
I'm seeing "systemctl stop " for several services taking a
long time because it goes through the timeout process, even though all
relevant processes have exited.

I'll give 2 examples.  Both examples are running inside a privileged
Rocky Linux 8.8 Docker container on a Rocky Linux 8.8 host.  The
systemd version, reported by "systemctl --version" in the container
is:

  systemd 239 (239-74.el8_8.5)

Here is ctdb.system:

  [Unit]
  Description=CTDB
  Documentation=man:ctdbd(1) man:ctdb(7)
  After=network-online.target time-sync.target
  ConditionFileNotEmpty=/etc/ctdb/nodes

  [Service]
  Type=forking
  LimitCORE=infinity
  LimitNOFILE=1048576
  TasksMax=4096
  PIDFile=/var/run/ctdb/ctdbd.pid
  ExecStart=/usr/sbin/ctdbd
  ExecStop=/usr/bin/ctdb shutdown
  KillMode=control-group
  Restart=no

  [Install]
  WantedBy=multi-user.target

"/usr/bin/ctdb shutdown" causes a controlled shutdown.  In many cases,
starting and then stopping using systemctl works fine.  However, many
times it takes >90s to stop, as per TimeoutStopSec.  If I reduce that
value then the duration reduces accordingly.  I can confirm using both
"ps auxfww" and "systemd-cgls" that within the container there are no
relevant processes a moment after "systemctl stop ctdb" is run.  In
particular, in systemd-cgls ctdb.service is gone but "systemctl stop
ctdb" is still waiting.

Before attempting to stop, the service is successfully started:

  Oct 11 00:56:44 rocky1 systemd[710741]: ctdb.service: Executing: 
/usr/sbin/ctdbd
  Oct 11 00:56:44 rocky1 ctdbd[710741]: CTDB logging to location 
file:/var/log/log.ctdb
  Oct 11 00:56:44 rocky1 systemd[1]: Received SIGCHLD from PID 710741 (ctdbd).
  Oct 11 00:56:44 rocky1 systemd[1]: Child 710741 (ctdbd) died (code=exited, 
status=0/SUCCESS)
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Child 710741 belongs to 
ctdb.service.
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Control process exited, 
code=exited status=0
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Got final SIGCHLD for state 
start.
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: New main PID 710742 belongs 
to service, we are happy.
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Main PID loaded: 710742
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Changed start -> running
  Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Job ctdb.service/start 
finished, result=done
  Oct 11 00:56:44 rocky1 systemd[1]: Started CTDB.
  -- Subject: Unit ctdb.service has finished start-up
  -- Defined-By: systemd
  -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
  -- 
  -- Unit ctdb.service has finished starting up.
  -- 
  -- The start-up result is done.

The relevant part of the log while stopping seems to be:

  Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710743 
(ctdb-eventd).
  Oct 11 00:56:47 rocky1 systemd[1]: Child 710742 (ctdbd) died (code=exited, 
status=0/SUCCESS)
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710742 belongs to 
ctdb.service.
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Can't open PID file 
/var/run/ctdb/ctdbd.pid (yet?) after stop: No such file or directory
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Main process exited, 
code=exited, status=0/SUCCESS
  Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal 
sender=org.freedesktop.systemd1 destination=n/a 
path=/org/freedesktop/systemd1/unit/ctdb_2eservice 
interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=54 
reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
  Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal 
sender=org.freedesktop.systemd1 destination=n/a 
path=/org/freedesktop/systemd1/unit/ctdb_2eservice 
interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=55 
reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
  Oct 11 00:56:47 rocky1 systemd[1]: Child 710743 (ctdb-eventd) died 
(code=exited, status=0/SUCCESS)
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710743 belongs to 
ctdb.service.
  Oct 11 00:56:47 rocky1 systemd[1]: systemd-journald.service: Received 
EPOLLHUP on stored fd 18 (stored), closing.
  Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710860 (ctdb).
  Oct 11 00:56:47 rocky1 systemd[1]: Child 710860 (ctdb) died (code=exited, 
status=0/SUCCESS)
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710860 belongs to 
ctdb.service.
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Control process exited, 
code=exited status=0
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Got final SIGCHLD for state 
stop.
  Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Changed stop -> stop-sigterm
  Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal 
sender=org.freedesktop.systemd1 destination=n/a 
path=/org/freedesktop/systemd1/unit/ctdb_2eservice 
interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=56 
reply_cookie=0 

Re: [systemd-devel] How to make an encrypted disk mentioned in /etc/crypttab "optional"?

2023-10-10 Thread Aaron Rainbolt
I figured out how to do this, sorta. I ended up bypassing the 
systemd-cryptsetup mechanism entirely and instead wrote my own shell 
script and systemd unit that did things exactly the way I wanted. Now if 
the user provides the right password, their home dir is mounted, if they 
type a wrong one they're asked for their password again, and if they 
type "guest" the system boots without mounting the encrypted home dir 
and uses the ramdisk-backed one instead.


In case anyone's interested, my systemd unit is:

[Unit]
Description=Mount encrypted home
Before=display-manager.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/usr/local/bin/firebook_homemount.sh
[Install]
WantedBy=multi-user.target

And the corresponding script is:

#!/bin/bash
# Mounts an encrypted home dir (or fails gracefully)
while true; do
    decryptPassword="$(systemd-ask-password --no-tty "Please provide 
your user password (or type \"guest\" to enter guest mode)")"

    if [ "${decryptPassword}" = "guest" ]; then
    break
    else
    echo "${decryptPassword}" | cryptsetup luksOpen 
/dev/disk/by-label/firebook-crypt firebook-home

    if [ "$?" = "0" ]; then
    mount /dev/mapper/firebook-home /home
    break
    fi
    fi
done

I also masked systemd-ask-password-console.service and 
systemd-ask-password-wall.service so that if the user enters a wrong 
password, they're asked for their password within Plymouth repeatedly 
(whereas originally if the user flunked their password at Plymouth, 
systemd-ask-password would default to using 
systemd-ask-password-wall.service next). This seems to be working so far.


On 10/9/23 02:10, Aaron Rainbolt wrote:

Good morning/evening, and thanks for your time.

I'm attempting to create a Fedora-based immutable distro (not based on 
Silverblue) that stores user data in an encrypted /home partition. The 
goal is to have something that behaves somewhat similar to Chrome OS. 
One feature I'm attempting to implement is a "guest mode", whereby a 
user can sign into the system without providing any password, but if 
they do so they don't gain access to the system's owner's data and 
virtually anything they do is erased upon shutdown.


In order to do this, I have two /home directories - one is part of the 
(immutable) root filesystem, which can only be written to thanks to a 
Dracut-created ramdisk overlay. The other is stored in an encrypted 
partition. I'm using a crypttab line like this to prepare the 
encrypted partition:


    firebook-home LABEL=firebook-crypt none luks,discard,nofail

And I'm using an fstab line like this to mount it:

    /dev/mapper/firebook-home /home ext4 defaults,nofail 0 0

Note that I've marked both of these with "nofail" - the goal is that 
the user will be prompted for their password by systemd upon boot, but 
if they do not provide the password (by intentionally providing a 
wrong password three times), the encrypted drive should not be mounted 
and the system should boot normally using the ephemeral home directory 
provided by the root filesystem + ramdisk overlay.


This seems to be *almost* working, however if I intentionally provide 
a wrong password to the password prompt a few times, it doesn't 
actually "give up" on getting a password from me. What it does instead 
is it stops asking me for the password at boot, but then rather than 
starting GNOME it just leaves me at a console screen. If I am able to 
get GDM to appear somehow, I can't sign in.


What I end up doing is switching to a TTY, signing in, and then 
elevating to root to troubleshoot. Once I've elevated to root, I get a 
`wall` message informing me that the system is *still waiting* for a 
password and that I need to run `systemd-tty-ask-password-agent` (I 
think?) to provide it. If I go ahead and do this, then restart GDM, 
I'm able to sign in after that. (I could be wrong about what command 
it's asking me to run, but I think it was 
`systemd-tty-ask-password-agent`.)


From my research, it looks like systemd is refusing to ever truly 
"give up" on getting the password for the encrypted /home directory, 
despite the use of `nofail` in the fstab and crypttab files. I'm not 
finding any documentation on how to get systemd to "give up" on 
getting the password. For my particular use case, I'd like systemd to 
just forget that the encrypted drive exists at all if the wrong 
password is given. If the user wants to mount the encrypted drive 
after that, they should either reboot or use cryptsetup manually.


Is there any way to make systemd "give up" on getting a password?

Thanks for your help!

Aaron



Re: [systemd-devel] Help! Reached target Local File Systems order is incorrect

2023-10-10 Thread Lennart Poettering
On Mo, 09.10.23 12:07, Tony Rodriguez (unixpro1...@gmail.com) wrote:

> Created a service that invokes a "systemctl daemon-reload". Goal is for a
> reload to occur early in the boot process, before other user made services
> are invoked.  During additional testing, sometimes it is correct and other
> times it is out of order (incorrect -  See steps C).  It may work for 5 or 6
> times after each reboot/shutdown, then randomly become incorrect. How can I
> make this more consistent? Already tried various keyword combinations
> (wants,before,after, etc) within the unit file, all without luck.
> Thought about something like "After=var.mount, etc" as well, but that is
> inflexible because I will not know filesystems users may create.
>
> A) Unit file
>
> [Unit]
> Description=Systemctl-Reload
> Wants=local-fs.target
> DefaultDependencies=yes
>
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/bin/systemctl daemon-reload
>
> [Install]
> WantedBy=local-fs.target
>
> B)  Correct order: ** Reached target Local File Systems is after all
> mounting is done. Sometimes it works.

You have not defined any order in the unit file. i.e. not After= nor
Before=. Hence it's going to be executed as quickly as possible during
boot.

See docs:

https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Before=

Generally though it's recommended not to reload PID 1 configuration
during the initial transaction if avoidable. Better approaches are to
put together generators or so, which can augment the set of units and
their dependencies already when the first transaction is put together.

https://www.freedesktop.org/software/systemd/man/systemd.generator.html

Lennart

--
Lennart Poettering, Berlin