Re: [systemd-devel] systemctl stop going through timeout even though all processes have exited
Is this with cgroups v1 or v2? If cgroups v1 is involved (thanks Docker), I recall it was a bit complex for systemd to get notified when the cgroup actually empties – via /sys/fs/cgroup/systemd/release_agent that specifies a helper executable that the kernel runs... I wonder if that mechanism is broken on your system. On Wed, Oct 11, 2023 at 7:38 AM Martin Schwenke wrote: > I'm seeing "systemctl stop " for several services taking a > long time because it goes through the timeout process, even though all > relevant processes have exited. > > I'll give 2 examples. Both examples are running inside a privileged > Rocky Linux 8.8 Docker container on a Rocky Linux 8.8 host. The > systemd version, reported by "systemctl --version" in the container > is: > > systemd 239 (239-74.el8_8.5) > > Here is ctdb.system: > > [Unit] > Description=CTDB > Documentation=man:ctdbd(1) man:ctdb(7) > After=network-online.target time-sync.target > ConditionFileNotEmpty=/etc/ctdb/nodes > > [Service] > Type=forking > LimitCORE=infinity > LimitNOFILE=1048576 > TasksMax=4096 > PIDFile=/var/run/ctdb/ctdbd.pid > ExecStart=/usr/sbin/ctdbd > ExecStop=/usr/bin/ctdb shutdown > KillMode=control-group > Restart=no > > [Install] > WantedBy=multi-user.target > > "/usr/bin/ctdb shutdown" causes a controlled shutdown. In many cases, > starting and then stopping using systemctl works fine. However, many > times it takes >90s to stop, as per TimeoutStopSec. If I reduce that > value then the duration reduces accordingly. I can confirm using both > "ps auxfww" and "systemd-cgls" that within the container there are no > relevant processes a moment after "systemctl stop ctdb" is run. In > particular, in systemd-cgls ctdb.service is gone but "systemctl stop > ctdb" is still waiting. > > Before attempting to stop, the service is successfully started: > > Oct 11 00:56:44 rocky1 systemd[710741]: ctdb.service: Executing: > /usr/sbin/ctdbd > Oct 11 00:56:44 rocky1 ctdbd[710741]: CTDB logging to location > file:/var/log/log.ctdb > Oct 11 00:56:44 rocky1 systemd[1]: Received SIGCHLD from PID 710741 > (ctdbd). > Oct 11 00:56:44 rocky1 systemd[1]: Child 710741 (ctdbd) died > (code=exited, status=0/SUCCESS) > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Child 710741 belongs to > ctdb.service. > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Control process exited, > code=exited status=0 > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Got final SIGCHLD for > state start. > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: New main PID 710742 > belongs to service, we are happy. > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Main PID loaded: 710742 > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Changed start -> running > Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Job ctdb.service/start > finished, result=done > Oct 11 00:56:44 rocky1 systemd[1]: Started CTDB. > -- Subject: Unit ctdb.service has finished start-up > -- Defined-By: systemd > -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel > -- > -- Unit ctdb.service has finished starting up. > -- > -- The start-up result is done. > > The relevant part of the log while stopping seems to be: > > Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710743 > (ctdb-eventd). > Oct 11 00:56:47 rocky1 systemd[1]: Child 710742 (ctdbd) died > (code=exited, status=0/SUCCESS) > Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710742 belongs to > ctdb.service. > Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Can't open PID file > /var/run/ctdb/ctdbd.pid (yet?) after stop: No such file or directory > Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Main process exited, > code=exited, status=0/SUCCESS > Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal > sender=org.freedesktop.systemd1 destination=n/a > path=/org/freedesktop/systemd1/unit/ctdb_2eservice > interface=org.freedesktop.DBus.Properties member=PropertiesChanged > cookie=54 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a > Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal > sender=org.freedesktop.systemd1 destination=n/a > path=/org/freedesktop/systemd1/unit/ctdb_2eservice > interface=org.freedesktop.DBus.Properties member=PropertiesChanged > cookie=55 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a > Oct 11 00:56:47 rocky1 systemd[1]: Child 710743 (ctdb-eventd) died > (code=exited, status=0/SUCCESS) > Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710743 belongs to > ctdb.service. > Oct 11 00:56:47 rocky1 systemd[1]: systemd-journald.service: Received > EPOLLHUP on stored fd 18 (stored), closing. > Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710860 > (ctdb). > Oct 11 00:56:47 rocky1 systemd[1]: Child 710860 (ctdb) died > (code=exited, status=0/SUCCESS) > Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710860 belongs
[systemd-devel] systemctl stop going through timeout even though all processes have exited
I'm seeing "systemctl stop " for several services taking a long time because it goes through the timeout process, even though all relevant processes have exited. I'll give 2 examples. Both examples are running inside a privileged Rocky Linux 8.8 Docker container on a Rocky Linux 8.8 host. The systemd version, reported by "systemctl --version" in the container is: systemd 239 (239-74.el8_8.5) Here is ctdb.system: [Unit] Description=CTDB Documentation=man:ctdbd(1) man:ctdb(7) After=network-online.target time-sync.target ConditionFileNotEmpty=/etc/ctdb/nodes [Service] Type=forking LimitCORE=infinity LimitNOFILE=1048576 TasksMax=4096 PIDFile=/var/run/ctdb/ctdbd.pid ExecStart=/usr/sbin/ctdbd ExecStop=/usr/bin/ctdb shutdown KillMode=control-group Restart=no [Install] WantedBy=multi-user.target "/usr/bin/ctdb shutdown" causes a controlled shutdown. In many cases, starting and then stopping using systemctl works fine. However, many times it takes >90s to stop, as per TimeoutStopSec. If I reduce that value then the duration reduces accordingly. I can confirm using both "ps auxfww" and "systemd-cgls" that within the container there are no relevant processes a moment after "systemctl stop ctdb" is run. In particular, in systemd-cgls ctdb.service is gone but "systemctl stop ctdb" is still waiting. Before attempting to stop, the service is successfully started: Oct 11 00:56:44 rocky1 systemd[710741]: ctdb.service: Executing: /usr/sbin/ctdbd Oct 11 00:56:44 rocky1 ctdbd[710741]: CTDB logging to location file:/var/log/log.ctdb Oct 11 00:56:44 rocky1 systemd[1]: Received SIGCHLD from PID 710741 (ctdbd). Oct 11 00:56:44 rocky1 systemd[1]: Child 710741 (ctdbd) died (code=exited, status=0/SUCCESS) Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Child 710741 belongs to ctdb.service. Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Control process exited, code=exited status=0 Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Got final SIGCHLD for state start. Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: New main PID 710742 belongs to service, we are happy. Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Main PID loaded: 710742 Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Changed start -> running Oct 11 00:56:44 rocky1 systemd[1]: ctdb.service: Job ctdb.service/start finished, result=done Oct 11 00:56:44 rocky1 systemd[1]: Started CTDB. -- Subject: Unit ctdb.service has finished start-up -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ctdb.service has finished starting up. -- -- The start-up result is done. The relevant part of the log while stopping seems to be: Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710743 (ctdb-eventd). Oct 11 00:56:47 rocky1 systemd[1]: Child 710742 (ctdbd) died (code=exited, status=0/SUCCESS) Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710742 belongs to ctdb.service. Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Can't open PID file /var/run/ctdb/ctdbd.pid (yet?) after stop: No such file or directory Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Main process exited, code=exited, status=0/SUCCESS Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/ctdb_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=54 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/ctdb_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=55 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Oct 11 00:56:47 rocky1 systemd[1]: Child 710743 (ctdb-eventd) died (code=exited, status=0/SUCCESS) Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710743 belongs to ctdb.service. Oct 11 00:56:47 rocky1 systemd[1]: systemd-journald.service: Received EPOLLHUP on stored fd 18 (stored), closing. Oct 11 00:56:47 rocky1 systemd[1]: Received SIGCHLD from PID 710860 (ctdb). Oct 11 00:56:47 rocky1 systemd[1]: Child 710860 (ctdb) died (code=exited, status=0/SUCCESS) Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Child 710860 belongs to ctdb.service. Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Control process exited, code=exited status=0 Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Got final SIGCHLD for state stop. Oct 11 00:56:47 rocky1 systemd[1]: ctdb.service: Changed stop -> stop-sigterm Oct 11 00:56:47 rocky1 systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/ctdb_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=56 reply_cookie=0
Re: [systemd-devel] How to make an encrypted disk mentioned in /etc/crypttab "optional"?
I figured out how to do this, sorta. I ended up bypassing the systemd-cryptsetup mechanism entirely and instead wrote my own shell script and systemd unit that did things exactly the way I wanted. Now if the user provides the right password, their home dir is mounted, if they type a wrong one they're asked for their password again, and if they type "guest" the system boots without mounting the encrypted home dir and uses the ramdisk-backed one instead. In case anyone's interested, my systemd unit is: [Unit] Description=Mount encrypted home Before=display-manager.service [Service] Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/firebook_homemount.sh [Install] WantedBy=multi-user.target And the corresponding script is: #!/bin/bash # Mounts an encrypted home dir (or fails gracefully) while true; do decryptPassword="$(systemd-ask-password --no-tty "Please provide your user password (or type \"guest\" to enter guest mode)")" if [ "${decryptPassword}" = "guest" ]; then break else echo "${decryptPassword}" | cryptsetup luksOpen /dev/disk/by-label/firebook-crypt firebook-home if [ "$?" = "0" ]; then mount /dev/mapper/firebook-home /home break fi fi done I also masked systemd-ask-password-console.service and systemd-ask-password-wall.service so that if the user enters a wrong password, they're asked for their password within Plymouth repeatedly (whereas originally if the user flunked their password at Plymouth, systemd-ask-password would default to using systemd-ask-password-wall.service next). This seems to be working so far. On 10/9/23 02:10, Aaron Rainbolt wrote: Good morning/evening, and thanks for your time. I'm attempting to create a Fedora-based immutable distro (not based on Silverblue) that stores user data in an encrypted /home partition. The goal is to have something that behaves somewhat similar to Chrome OS. One feature I'm attempting to implement is a "guest mode", whereby a user can sign into the system without providing any password, but if they do so they don't gain access to the system's owner's data and virtually anything they do is erased upon shutdown. In order to do this, I have two /home directories - one is part of the (immutable) root filesystem, which can only be written to thanks to a Dracut-created ramdisk overlay. The other is stored in an encrypted partition. I'm using a crypttab line like this to prepare the encrypted partition: firebook-home LABEL=firebook-crypt none luks,discard,nofail And I'm using an fstab line like this to mount it: /dev/mapper/firebook-home /home ext4 defaults,nofail 0 0 Note that I've marked both of these with "nofail" - the goal is that the user will be prompted for their password by systemd upon boot, but if they do not provide the password (by intentionally providing a wrong password three times), the encrypted drive should not be mounted and the system should boot normally using the ephemeral home directory provided by the root filesystem + ramdisk overlay. This seems to be *almost* working, however if I intentionally provide a wrong password to the password prompt a few times, it doesn't actually "give up" on getting a password from me. What it does instead is it stops asking me for the password at boot, but then rather than starting GNOME it just leaves me at a console screen. If I am able to get GDM to appear somehow, I can't sign in. What I end up doing is switching to a TTY, signing in, and then elevating to root to troubleshoot. Once I've elevated to root, I get a `wall` message informing me that the system is *still waiting* for a password and that I need to run `systemd-tty-ask-password-agent` (I think?) to provide it. If I go ahead and do this, then restart GDM, I'm able to sign in after that. (I could be wrong about what command it's asking me to run, but I think it was `systemd-tty-ask-password-agent`.) From my research, it looks like systemd is refusing to ever truly "give up" on getting the password for the encrypted /home directory, despite the use of `nofail` in the fstab and crypttab files. I'm not finding any documentation on how to get systemd to "give up" on getting the password. For my particular use case, I'd like systemd to just forget that the encrypted drive exists at all if the wrong password is given. If the user wants to mount the encrypted drive after that, they should either reboot or use cryptsetup manually. Is there any way to make systemd "give up" on getting a password? Thanks for your help! Aaron
Re: [systemd-devel] Help! Reached target Local File Systems order is incorrect
On Mo, 09.10.23 12:07, Tony Rodriguez (unixpro1...@gmail.com) wrote: > Created a service that invokes a "systemctl daemon-reload". Goal is for a > reload to occur early in the boot process, before other user made services > are invoked. During additional testing, sometimes it is correct and other > times it is out of order (incorrect - See steps C). It may work for 5 or 6 > times after each reboot/shutdown, then randomly become incorrect. How can I > make this more consistent? Already tried various keyword combinations > (wants,before,after, etc) within the unit file, all without luck. > Thought about something like "After=var.mount, etc" as well, but that is > inflexible because I will not know filesystems users may create. > > A) Unit file > > [Unit] > Description=Systemctl-Reload > Wants=local-fs.target > DefaultDependencies=yes > > [Service] > Type=oneshot > RemainAfterExit=yes > ExecStart=/bin/systemctl daemon-reload > > [Install] > WantedBy=local-fs.target > > B) Correct order: ** Reached target Local File Systems is after all > mounting is done. Sometimes it works. You have not defined any order in the unit file. i.e. not After= nor Before=. Hence it's going to be executed as quickly as possible during boot. See docs: https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Before= Generally though it's recommended not to reload PID 1 configuration during the initial transaction if avoidable. Better approaches are to put together generators or so, which can augment the set of units and their dependencies already when the first transaction is put together. https://www.freedesktop.org/software/systemd/man/systemd.generator.html Lennart -- Lennart Poettering, Berlin