Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Do, 28.01.21 10:08, Martin Wilck (mwi...@suse.com) wrote: > Hi Lennart, > > thanks again. > > On Wed, 2021-01-27 at 23:56 +0100, Lennart Poettering wrote: > > On Mi, 27.01.21 21:51, Martin Wilck (mwi...@suse.com) wrote: > > > > if you want the initrd environment to fully continue to exist, > > I don't. I just need /sys and /dev (and perhaps /proc and /run, too) to > remain accessible. I believe most root storage daemons will need this. > > > consider creating a new mount namespace, bind mount the initrd root > > into it recursively to some new dir you created. Then afterwards mark > > that mount MS_PRIVATE. then pivot_root()+chroot()+chdir() into your > > new old world. > > And on exit, I'd need to tear all that down again, right? I don't want > my daemon to block shutdown because some file systems haven't been > cleanly unmounted. if you don't need the initrd root, i.e. don't intend to open any further files, then you can just mount a an empty tmpfs to your tempdir, mount proc/sys into it, then transition your process into it and forget about the rest. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
Hi Lennart, thanks again. On Wed, 2021-01-27 at 23:56 +0100, Lennart Poettering wrote: > On Mi, 27.01.21 21:51, Martin Wilck (mwi...@suse.com) wrote: > > if you want the initrd environment to fully continue to exist, I don't. I just need /sys and /dev (and perhaps /proc and /run, too) to remain accessible. I believe most root storage daemons will need this. > consider creating a new mount namespace, bind mount the initrd root > into it recursively to some new dir you created. Then afterwards mark > that mount MS_PRIVATE. then pivot_root()+chroot()+chdir() into your > new old world. And on exit, I'd need to tear all that down again, right? I don't want my daemon to block shutdown because some file systems haven't been cleanly unmounted. Regards, Martin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Tue, 2021-01-26 at 11:33 +0100, Lennart Poettering wrote: > > > > > [Unit] > > Description=NVMe Event Monitor for Automatical Subsystem Connection > > Documentation=man:nvme-monitor(1) > > DefaultDependencies=false > > Conflicts=shutdown.target > > Requires=systemd-udevd-kernel.socket > > After=systemd-udevd-kernel.socket > > Why do you require this? > Brain fart on my part. I need to connect to the kernel socket, but that doesn't require the systemd unit. > My guess: the socket unit gets shutdown, and since you have Requires= > on it you thus go away too. That was it, thanks a lot. So obvious in hindsight :-/ Meanwhile I've looked a bit deeper into the problems accessing "/dev" that I talked about in my other post. scandir on "/" actually returns an empty directory after switching root, and any path lookups for absolute paths fail. I didn't expect that, because I thought systemd removed the contents of the old root, and stopped on (bind) mounts. Again, this is systemd-234. If I chdir("/run") before switching root and chroot("..") afterwards (*), I'm able to access everything just fine (**). However, if I do this, I end up in the real root file system, which is what I wanted to avoid in the first place. So, I guess I'll have to create bind mounts for /dev, /sys etc. in the old root, possibly after entering a private mount namespace? The other option would be to save fd's for the file systems I need to access and use opendirat() only. Right? Regards, Martin (*) Michal suggested to simply do chroot(".") instead. That might as well work, I haven't tried it yet. (**) For notification about switching root, I used epoll(EPOLLPRI) on /proc/self/mountinfo, because I read that inotify doesn't work on proc. polling for EPOLLPRI works just fine. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Mi, 27.01.21 21:51, Martin Wilck (mwi...@suse.com) wrote: > Meanwhile I've looked a bit deeper into the problems accessing "/dev" > that I talked about in my other post. scandir on "/" actually returns > an empty directory after switching root, and any path lookups for > absolute paths fail. I didn't expect that, because I thought systemd > removed the contents of the old root, and stopped on (bind) mounts. > Again, this is systemd-234. Oh, right we actually use MS_MOVE to move the old /dev to the new root. If you stay behind in the old you won't see anything anymore — it got moved away. Note that the switch root code also attempts to empty out the initrd after the transition, or what's left of it. You might want to make the initrd read-only if that is a problem to you. > If I chdir("/run") before switching root and chroot("..") afterwards > (*), I'm able to access everything just fine (**). However, if I do > this, I end up in the real root file system, which is what I wanted to > avoid in the first place. Yes, this works the way it works, because /run is moved to the new root, and thus if you chroot its parent you are in the new root. > So, I guess I'll have to create bind mounts for /dev, /sys etc. in the > old root, possibly after entering a private mount namespace? if you want the initrd environment to fully continue to exist, consider creating a new mount namespace, bind mount the initrd root into it recursively to some new dir you created. Then afterwards mark that mount MS_PRIVATE. then pivot_root()+chroot()+chdir() into your new old world. also, make the initrd superblock read-only, if you need its contents. > The other option would be to save fd's for the file systems I need to > access and use opendirat() only. Right? That works too, if you can. > (**) For notification about switching root, I used epoll(EPOLLPRI) on > /proc/self/mountinfo, because I read that inotify doesn't work on proc. > polling for EPOLLPRI works just fine. Right, sorry. POLLPRI is the right API. inotify is used by cgroupfs for similar notifications, and I mixed that up. for /proc/self/mountinfo POLLPRI is the right choice. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Tue, 2021-01-26 at 11:30 +0100, Lennart Poettering wrote: > > > Imagine two parallel instances of systemd-udevd (IMO there are > > reasons > > to handle it like a "root storage daemon" in some distant future). > > Hmm, wa? naahh.. udev is about dicovery it should not be required to > maintain access to something you found. True. But if udev ran without interruption, we could get rid of coldplug after switching root. That could possibly save us a lot of trouble. Anyway, it's just a thought I find tempting. Regrads Martin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Di, 26.01.21 13:30, Martin Wilck (mwi...@suse.com) wrote: > On Tue, 2021-01-26 at 11:30 +0100, Lennart Poettering wrote: > > > > > Imagine two parallel instances of systemd-udevd (IMO there are > > > reasons > > > to handle it like a "root storage daemon" in some distant future). > > > > Hmm, wa? naahh.. udev is about dicovery it should not be required to > > maintain access to something you found. > > True. But if udev ran without interruption, we could get rid of > coldplug after switching root. That could possibly save us a lot of > trouble. And introduce new trouble. Usually the rules on the host are more comprehensive than those in the initrd. You have to coldplug for the bigger ruleset. If you want to avoid that you basically would have to pack up a ton more stuff into the initrd. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Di, 26.01.21 01:19, Martin Wilck (mwi...@suse.com) wrote: > On Mon, 2021-01-25 at 18:33 +0100, Lennart Poettering wrote: > > > > Consider using IgnoreOnIsolate=. > > > > I fail to make this work. Installed this to the initrd (note the > ExecStop "command"): > > [Unit] > Description=NVMe Event Monitor for Automatical Subsystem Connection > Documentation=man:nvme-monitor(1) > DefaultDependencies=false > Conflicts=shutdown.target > Requires=systemd-udevd-kernel.socket > After=systemd-udevd-kernel.socket Why do you require this? My guess: the socket unit gets shutdown, and since you have Requires= on it you thus go away too. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Mo, 25.01.21 19:04, Martin Wilck (mwi...@suse.com) wrote: > Is there any way for the daemon to get notified if root is switched? /proc/self/mountinfo sends out notification events via inotify when mounts are established/removed. I am pretty sure pivot_root() also generates that. Your daemon could subscribe to that, and then recheck each time if /etc/initrd-release is still accessible. Once you see ENOENT on that you can assume the switch root took place, then close the inotify. > Would there be a potential security issue because the daemon keeps a > reference to the intird root FS? Modern initrds transition their own root to /run/initramfs anyway, so this shouldn't be a problem normally. > Imagine two parallel instances of systemd-udevd (IMO there are reasons > to handle it like a "root storage daemon" in some distant future). Hmm, wa? naahh.. udev is about dicovery it should not be required to maintain access to something you found. > > option two: if you cannot have multiple instances of your subsystem, > > then the only option is to make the initrd version manage > > everything. But of course, that sucks, but there's little one can do > > about that. > > Why would it be so bad? I would actually prefer a single instance for > most subsystems. But maybe I'm missing something. Well, because you can't update things on-the-fly then, you cannot reexec since everything is backed by initrd. You cannot restart things, and so on. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Mon, 2021-01-25 at 18:33 +0100, Lennart Poettering wrote: > > Consider using IgnoreOnIsolate=. > I fail to make this work. Installed this to the initrd (note the ExecStop "command"): [Unit] Description=NVMe Event Monitor for Automatical Subsystem Connection Documentation=man:nvme-monitor(1) DefaultDependencies=false Conflicts=shutdown.target Requires=systemd-udevd-kernel.socket After=systemd-udevd-kernel.socket Before=sysinit.target systemd-udev-trigger.service nvmefc-boot-connections.service RequiresMountsFor=/sys IgnoreOnIsolate=true [Service] Type=simple ExecStart=/usr/sbin/nvme monitor $NVME_MONITOR_OPTIONS ExecStop=-/usr/bin/systemctl show -p IgnoreOnIsolate %N KillMode=mixed [Install] WantedBy=sysinit.target I verified (in a pre-pivot shell) that systemd had seen the IgnoreOnIsolate property. But when initrd-switch-root.target is isolated, the unit is cleanly stopped nonethless. [ 192.832127] dolin systemd[1]: initrd-switch-root.target: Trying to enqueue job initrd-switch-root.target/start/isolate [ 192.836697] dolin systemd[1]: nvme-monitor.service: Installed new job nvme-monitor.service/stop as 98 [ 193.027182] dolin systemctl[3751]: IgnoreOnIsolate=yes [ 193.029124] dolin systemd[1]: nvme-monitor.service: Changed running -> stop-sigterm [ 193.029353] dolin nvme[768]: monitor_main_loop: monitor: exit signal received [ 193.029535] dolin systemd[1]: Stopping NVMe Event Monitor for Automatical Subsystem Connection... [ 193.065746] dolin systemd[1]: Child 768 (nvme) died (code=exited, status=0/SUCCESS) [ 193.065905] dolin systemd[1]: nvme-monitor.service: Child 768 belongs to nvme-monitor.service [ 193.066073] dolin systemd[1]: nvme-monitor.service: Main process exited, code=exited, status=0/SUCCESS [ 193.066241] dolin systemd[1]: nvme-monitor.service: Changed stop-sigterm -> dead [ 193.066403] dolin systemd[1]: nvme-monitor.service: Job nvme-monitor.service/stop finished, result=done [ 193.066571] dolin systemd[1]: Stopped NVMe Event Monitor for Automatical Subsystem Connection. [ 193.500010] dolin systemd[1]: initrd-switch-root.target: Job initrd-switch-root.target/start finished, result=done [ 193.500188] dolin systemd[1]: Reached target Switch Root. After boot, the service actually remains running when isolating e.g. "rescue.target". But when switching root, it doesn't work. dolin:~/:[141]# systemctl show -p IgnoreOnIsolate nvme-monitor.service IgnoreOnIsolate=yes Tested only with systemd-234 so far. Any ideas what I'm getting wrong? Martin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Mon, 2021-01-25 at 18:33 +0100, Lennart Poettering wrote: > On Sa, 23.01.21 02:44, Martin Wilck (mwi...@suse.com) wrote: > > > Hi > > > > I'm experimenting with systemd's root storage daemon concept > > (https://systemd.io/ROOT_STORAGE_DAEMONS/). > > > > I'm starting my daemon from a service unit in the initrd, and > > I set argv[0][0] to '@', as suggested in the text. > > > > So far so good, the daemon isn't killed. > > > > But a lot more is necessary to make this actually *work*. Here's a > > list > > of issues I found, and what ideas I've had so far how to deal with > > them. I'd appreciate some guidance. > > > > 1) Even if a daemon is exempted from being killed by killall(), the > > unit it belongs to will be stopped when initrd-switch-root.target > > is > > isolated, and that will normally cause the daemon to be stopped, > > too. > > AFAICS, the only way to ensure the daemon is not killed is by > > setting > > "KillMode=none" in the unit file. Right? Any other mode would send > > SIGKILL sooner or later even if my daemon was smart enough to > > ignore > > SIGTERM when running in the intird. > > Consider using IgnoreOnIsolate=. Ah, thanks a lot. IIUC that would actually make systemd realize that the unit continues to run after switching root, which is good. Like I remarked for KillMode=none, IgnoreOnIsolate=true would be suitable only for the "root storage daemon" instance, not for a possible other instance serving data volumes only. I suppose there's no way to make this directive conditional on being run from the initrd, so I'd need two different unit files, or use a drop-in in the initrd. Is there any way for the daemon to get notified if root is switched? > > > 3) The daemon that has been started in the initrd's root file > > system > > is unable to access e.g. the /dev file system after switching > > root. I haven't yet systematically analyzed which file systems are > > available. I suppose this must be handled by creating bind > > mounts, > > but I need guidance how to do this. Or would it be > > possible/advisable for the daemon to also re-execute itself under > > the real root, like systemd itself? I thought the root storage > > daemon idea was developed to prevent exactly that. > > Not sure why it wouldn't be able to access /dev after switching. We > do > not allocate any new instance of that, it's always the same devtmpfs > instance. I haven't digged deeper yet, I just saw "No such file or directory" error messages trying to access device nodes that I knew existed, so I concluded there were issues with /dev. > Do not reexec onto the host fs, that's really not how this should be > done. Would there be a potential security issue because the daemon keeps a reference to the intird root FS? > > > 4) Most daemons that might qualify as "root storage daemon" also > > have > > a "normal" mode, when the storage they serve is _not_ used as root > > FS, > > just for data storage. In that case, it's probably preferrable to > > run > > them from inside the root FS rather than as root storage daemon. > > That > > has various advantages, e.g. the possibility to update the sofware > > without rebooting. It's not clear to me yet how to handle the two > > options (root and non-root) cleanly with unit files. > > option one: have two unit files? i.e. two instances of the subsystem, > one managing the root storage, and one the rest. Hm, that looks clumsy to me. It could be done e.g. for multipath by using separate configuration files and setting up appropriate blacklists, but it would cause a lot of work to be done twice. e.g. uevents would be received by both daemons and acted upon simultaneously. Generally ruling out race conditions wouldn't be easy. Imagine two parallel instances of systemd-udevd (IMO there are reasons to handle it like a "root storage daemon" in some distant future). > option two: if you cannot have multiple instances of your subsystem, > then the only option is to make the initrd version manage > everything. But of course, that sucks, but there's little one can do > about that. Why would it be so bad? I would actually prefer a single instance for most subsystems. But maybe I'm missing something. Thanks, Martin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Questions about systemd's "root storage daemon" concept
On Sa, 23.01.21 02:44, Martin Wilck (mwi...@suse.com) wrote: > Hi > > I'm experimenting with systemd's root storage daemon concept > (https://systemd.io/ROOT_STORAGE_DAEMONS/). > > I'm starting my daemon from a service unit in the initrd, and > I set argv[0][0] to '@', as suggested in the text. > > So far so good, the daemon isn't killed. > > But a lot more is necessary to make this actually *work*. Here's a list > of issues I found, and what ideas I've had so far how to deal with > them. I'd appreciate some guidance. > > 1) Even if a daemon is exempted from being killed by killall(), the > unit it belongs to will be stopped when initrd-switch-root.target is > isolated, and that will normally cause the daemon to be stopped, too. > AFAICS, the only way to ensure the daemon is not killed is by setting > "KillMode=none" in the unit file. Right? Any other mode would send > SIGKILL sooner or later even if my daemon was smart enough to ignore > SIGTERM when running in the intird. Consider using IgnoreOnIsolate=. > 3) The daemon that has been started in the initrd's root file system > is unable to access e.g. the /dev file system after switching > root. I haven't yet systematically analyzed which file systems are > available. I suppose this must be handled by creating bind mounts, > but I need guidance how to do this. Or would it be > possible/advisable for the daemon to also re-execute itself under > the real root, like systemd itself? I thought the root storage > daemon idea was developed to prevent exactly that. Not sure why it wouldn't be able to access /dev after switching. We do not allocate any new instance of that, it's always the same devtmpfs instance. Do not reexec onto the host fs, that's really not how this should be done. > 4) Most daemons that might qualify as "root storage daemon" also have > a "normal" mode, when the storage they serve is _not_ used as root FS, > just for data storage. In that case, it's probably preferrable to run > them from inside the root FS rather than as root storage daemon. That > has various advantages, e.g. the possibility to update the sofware > without rebooting. It's not clear to me yet how to handle the two > options (root and non-root) cleanly with unit files. option one: have two unit files? i.e. two instances of the subsystem, one managing the root storage, and one the rest. option two: if you cannot have multiple instances of your subsystem, then the only option is to make the initrd version manage everything. But of course, that sucks, but there's little one can do about that. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Questions about systemd's "root storage daemon" concept
Hi I'm experimenting with systemd's root storage daemon concept (https://systemd.io/ROOT_STORAGE_DAEMONS/). I'm starting my daemon from a service unit in the initrd, and I set argv[0][0] to '@', as suggested in the text. So far so good, the daemon isn't killed. But a lot more is necessary to make this actually *work*. Here's a list of issues I found, and what ideas I've had so far how to deal with them. I'd appreciate some guidance. 1) Even if a daemon is exempted from being killed by killall(), the unit it belongs to will be stopped when initrd-switch-root.target is isolated, and that will normally cause the daemon to be stopped, too. AFAICS, the only way to ensure the daemon is not killed is by setting "KillMode=none" in the unit file. Right? Any other mode would send SIGKILL sooner or later even if my daemon was smart enough to ignore SIGTERM when running in the intird. 2) KillMode=none will make systemd consider the respective unit stopped, even if the daemon is still running. That feels wrong. Are there better options? 3) The daemon that has been started in the initrd's root file system is unable to access e.g. the /dev file system after switching root. I haven't yet systematically analyzed which file systems are available. I suppose this must be handled by creating bind mounts, but I need guidance how to do this. Or would it be possible/advisable for the daemon to also re-execute itself under the real root, like systemd itself? I thought the root storage daemon idea was developed to prevent exactly that. 4) Most daemons that might qualify as "root storage daemon" also have a "normal" mode, when the storage they serve is _not_ used as root FS, just for data storage. In that case, it's probably preferrable to run them from inside the root FS rather than as root storage daemon. That has various advantages, e.g. the possibility to update the sofware without rebooting. It's not clear to me yet how to handle the two options (root and non-root) cleanly with unit files. - if (for "root storage daemon" mode) I simply put the enabled unit file in the initrd, systemd will start the daemon twice, at least if it's a "simple" service. I considered working with conditions, such as ConditionPathExists=!/run/my-daemon/my-pidfile (where the pidfile would have been created by the initrd-based daemon) but that would cause the unit in the root FS to fail, which is ugly. - I could (for root mode) add the enabled unit file to the intird and afterwards disable it in the root fs, thus avoiding two copies to be started. But that would cause issues whenever the intird must be rebuilt. I suppose it could be handled with a dracut module. - I could create two different unit files mydaemon.service and mydaemon-initrd.service and have them conflict. dracut doesn't support this out of the box. A separate dracut module would be necessary, too. - Some settings such as KillMode=none make sense for the service in the intird environment, but not for the one running in the root FS, and vice versa. This is another argument for having separate unit files, or initrd-specific drop-ins. Bottom line for 4) is that a dracut module specific to the daemon at hand must be written. That dracut module would need to figure out whether the service is required for mounting root, and activate "root- storage-daemon" mode by adding the service to the intird. The instance in the root FS would then either need be disabled, or be smart enough to detect situation and exit gracefully. Ideally, "systemctl status" would show the service as running even thought the instance inside the root FS isn't actually running. I am unsure if all this can be achieved easily with the current sytemd functionality, please advise. I hope this makes at least some sense. Suggestions and Feedback welcome. Regards Martin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel