Re: [systemd-devel] Delaying VM startup until block devices are available

2024-01-26 Thread Andrei Borzenkov

On 27.01.2024 00:40, Orion Poplawski wrote:

On 1/26/24 01:21, Lennart Poettering wrote:

On Do, 25.01.24 16:28, Orion Poplawski (or...@nwra.com) wrote:


We have various VMs that are back by luks encrypted LVs.  At boot the volumes
are decrypted by clevis.  The problem we are seeing at the moment is that the
VMs are started before the block devices are decrypted.  Our current
solution is:


We generally wait for all devices listed in /etc/crypttab, unless you
set noauto or nofail.


We are setting 'nofail', because I don't think I want to fail the boot in
general.  They are not required for the system itself to function, just
certain VMs. e.g:

luks-backup /dev/vg_root/backup-raw none discard,_netdev,nofail

See below for more though.


# cat /etc/systemd/system/virtqemud.service.d/override.conf
[Unit]
After=blockdev@dev-mapper-luks\x2dbackup.target
blockdev@dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target

Where we list each of the volumes to be decyrpted as blocking the virtqemud
service.

Does anyone have any better alternatives?  My main issue it that it feels
somewhere in between fine-grained and coarse-grained control.

Ideally I think one would be able to have each individual VM startup
automatically delayed until the devices each used became available, but I
don't see how to do this.


I am not sure how libvirt works, but if it runs every VM in a systemd
unit, then you could just order the device before that unit, or the
unit after the device.

Really depends on how libvirt splits things up.


I'm honestly not sure how libvirt works here either.  But there seems to be 
this:

# rpm -qf /usr/lib/systemd/system/virtqemud.service
libvirt-daemon-driver-qemu-9.5.0-7.el9_3.alma.2.x86_64

which gets started:

Jan 25 14:42:58 systemd[1]: Starting Virtualization qemu daemon...
Jan 25 14:42:58 systemd[1]: Started Virtualization qemu daemon.

Then the qemu-kvm processes end up in their own scope:

● machine-qemu\x2d1\x2dsrv\x2dmry01.scope - Virtual Machine qemu-1-srv-mry01
  Loaded: loaded
(/run/systemd/transient/machine-qemu\x2d1\x2dsrv\x2dmry01.scope; transient)
   Transient: yes
  Active: active (running) since Thu 2024-01-25 14:42:58 PST; 22h ago
   Tasks: 6 (limit: 16384)
  Memory: 15.6G
 CPU: 1h 15min 44.863s
  CGroup: /machine.slice/machine-qemu\x2d1\x2dsrv\x2dmry01.scope
  └─libvirt
└─9086 /usr/libexec/qemu-kvm -name guest=...




Alternatively it seems like one should be able to delay all VM startup until
all volumes in /etc/crypttab were unlocked, rather than having to specify each
one.  But I don't see a target for that.


This is default behaviour. Anything listed in /etc/crypttab is ordered
before cryptsetup.target, which is ordered before sysinit.target,
which is ordered before basic.target, which is ordered before regular services.


We are specifying _netdev because they require the network to unlock.  This I
think puts them under remote-cryptsetup.target, and I used to depend on that.
But with EL9 I'm seeing:

# j -b -u remote-cryptsetup.target -u
'blockdev@dev-mapper-luks\x2dbackup.target' -u clevis-luks-askpass.service
--no-hostname

Jan 25 14:42:12 systemd[1]: Reached target Remote Encrypted Volumes.
Jan 25 14:42:12 systemd[1]: Started Forward Password Requests to Clevis.
Jan 25 14:42:48 clevis-luks-askpass[1706]: Unlocked /dev/vg_root/backup-raw
(UUID=d6d25a85-2d43-4780-a312-e0e9b2383807) successfully
Jan 25 14:42:54 systemd[1]: Reached target Block Device Preparation for
/dev/mapper/luks-backup.
Jan 25 14:42:59 systemd[1]: clevis-luks-askpass.service: Deactivated 
successfully.

# systemctl list-dependencies remote-cryptsetup.target
remote-cryptsetup.target
● ├─systemd-cryptsetup@luks\x2dbackup.service

# j --no-hostname -b -u 'systemd-cryptsetup@luks\x2dbackup.service'
Jan 25 14:42:12 systemd[1]: Starting Cryptography Setup for luks-backup...
Jan 25 14:42:42 systemd-cryptsetup[1697]: Set cipher aes, mode xts-plain64,
key size 512 bits for device /dev/vg_root/backup-raw.
Jan 25 14:42:47 systemd-cryptsetup[1697]: Failed to activate with specified
passphrase. (Passphrase incorrect?)
Jan 25 14:42:48 systemd-cryptsetup[1697]: Set cipher aes, mode xts-plain64,
key size 512 bits for device /dev/vg_root/backup-raw.
Jan 25 14:42:54 systemd[1]: Finished Cryptography Setup for luks-backup.

# systemctl show 'systemd-cryptsetup@luks\x2dbackup.service' | grep Type
Type=oneshot

So, if I'm following things correctly, this doesn't seem right.
remote-cryptsetup.target depends on systemd-cryptsetup@luks\x2dbackup.service.
  This is a oneshot that is considered started after the main process exits,
and above is shown as 14:42:54.  But we are seeing 'Reached target Remote
Encrypted Volumes' at 14:42:12.

What am I missing?

systemd-252-18.el9.x86_64




"nofail" encrypted devices are not ordered before 
(remote-)cryptsetup.target to not delay startup. The reasoning is, if 
you do not care whether this device exists or not, there is no reason to 

Re: [systemd-devel] Delaying VM startup until block devices are available

2024-01-26 Thread Orion Poplawski
On 1/26/24 01:21, Lennart Poettering wrote:
> On Do, 25.01.24 16:28, Orion Poplawski (or...@nwra.com) wrote:
> 
>> We have various VMs that are back by luks encrypted LVs.  At boot the volumes
>> are decrypted by clevis.  The problem we are seeing at the moment is that the
>> VMs are started before the block devices are decrypted.  Our current
>> solution is:
> 
> We generally wait for all devices listed in /etc/crypttab, unless you
> set noauto or nofail.

We are setting 'nofail', because I don't think I want to fail the boot in
general.  They are not required for the system itself to function, just
certain VMs. e.g:

luks-backup /dev/vg_root/backup-raw none discard,_netdev,nofail

See below for more though.

>> # cat /etc/systemd/system/virtqemud.service.d/override.conf
>> [Unit]
>> After=blockdev@dev-mapper-luks\x2dbackup.target
>> blockdev@dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target
>>
>> Where we list each of the volumes to be decyrpted as blocking the virtqemud
>> service.
>>
>> Does anyone have any better alternatives?  My main issue it that it feels
>> somewhere in between fine-grained and coarse-grained control.
>>
>> Ideally I think one would be able to have each individual VM startup
>> automatically delayed until the devices each used became available, but I
>> don't see how to do this.
> 
> I am not sure how libvirt works, but if it runs every VM in a systemd
> unit, then you could just order the device before that unit, or the
> unit after the device.
> 
> Really depends on how libvirt splits things up.

I'm honestly not sure how libvirt works here either.  But there seems to be 
this:

# rpm -qf /usr/lib/systemd/system/virtqemud.service
libvirt-daemon-driver-qemu-9.5.0-7.el9_3.alma.2.x86_64

which gets started:

Jan 25 14:42:58 systemd[1]: Starting Virtualization qemu daemon...
Jan 25 14:42:58 systemd[1]: Started Virtualization qemu daemon.

Then the qemu-kvm processes end up in their own scope:

● machine-qemu\x2d1\x2dsrv\x2dmry01.scope - Virtual Machine qemu-1-srv-mry01
 Loaded: loaded
(/run/systemd/transient/machine-qemu\x2d1\x2dsrv\x2dmry01.scope; transient)
  Transient: yes
 Active: active (running) since Thu 2024-01-25 14:42:58 PST; 22h ago
  Tasks: 6 (limit: 16384)
 Memory: 15.6G
CPU: 1h 15min 44.863s
 CGroup: /machine.slice/machine-qemu\x2d1\x2dsrv\x2dmry01.scope
 └─libvirt
   └─9086 /usr/libexec/qemu-kvm -name guest=...

> 
>> Alternatively it seems like one should be able to delay all VM startup until
>> all volumes in /etc/crypttab were unlocked, rather than having to specify 
>> each
>> one.  But I don't see a target for that.
> 
> This is default behaviour. Anything listed in /etc/crypttab is ordered
> before cryptsetup.target, which is ordered before sysinit.target,
> which is ordered before basic.target, which is ordered before regular 
> services.

We are specifying _netdev because they require the network to unlock.  This I
think puts them under remote-cryptsetup.target, and I used to depend on that.
But with EL9 I'm seeing:

# j -b -u remote-cryptsetup.target -u
'blockdev@dev-mapper-luks\x2dbackup.target' -u clevis-luks-askpass.service
--no-hostname

Jan 25 14:42:12 systemd[1]: Reached target Remote Encrypted Volumes.
Jan 25 14:42:12 systemd[1]: Started Forward Password Requests to Clevis.
Jan 25 14:42:48 clevis-luks-askpass[1706]: Unlocked /dev/vg_root/backup-raw
(UUID=d6d25a85-2d43-4780-a312-e0e9b2383807) successfully
Jan 25 14:42:54 systemd[1]: Reached target Block Device Preparation for
/dev/mapper/luks-backup.
Jan 25 14:42:59 systemd[1]: clevis-luks-askpass.service: Deactivated 
successfully.

# systemctl list-dependencies remote-cryptsetup.target
remote-cryptsetup.target
● ├─systemd-cryptsetup@luks\x2dbackup.service

# j --no-hostname -b -u 'systemd-cryptsetup@luks\x2dbackup.service'
Jan 25 14:42:12 systemd[1]: Starting Cryptography Setup for luks-backup...
Jan 25 14:42:42 systemd-cryptsetup[1697]: Set cipher aes, mode xts-plain64,
key size 512 bits for device /dev/vg_root/backup-raw.
Jan 25 14:42:47 systemd-cryptsetup[1697]: Failed to activate with specified
passphrase. (Passphrase incorrect?)
Jan 25 14:42:48 systemd-cryptsetup[1697]: Set cipher aes, mode xts-plain64,
key size 512 bits for device /dev/vg_root/backup-raw.
Jan 25 14:42:54 systemd[1]: Finished Cryptography Setup for luks-backup.

# systemctl show 'systemd-cryptsetup@luks\x2dbackup.service' | grep Type
Type=oneshot

So, if I'm following things correctly, this doesn't seem right.
remote-cryptsetup.target depends on systemd-cryptsetup@luks\x2dbackup.service.
 This is a oneshot that is considered started after the main process exits,
and above is shown as 14:42:54.  But we are seeing 'Reached target Remote
Encrypted Volumes' at 14:42:12.

What am I missing?

systemd-252-18.el9.x86_64


-- 
Orion Poplawski
he/him/his  - surely the least important thing about me
Manager of IT Systems  720-772-5637
NWRA, Boulder/CoRA Office 

Re: [systemd-devel] Delaying VM startup until block devices are available

2024-01-26 Thread Lennart Poettering
On Do, 25.01.24 16:28, Orion Poplawski (or...@nwra.com) wrote:

> We have various VMs that are back by luks encrypted LVs.  At boot the volumes
> are decrypted by clevis.  The problem we are seeing at the moment is that the
> VMs are started before the block devices are decrypted.  Our current
> solution is:

We generally wait for all devices listed in /etc/crypttab, unless you
set noauto or nofail.

>
> # cat /etc/systemd/system/virtqemud.service.d/override.conf
> [Unit]
> After=blockdev@dev-mapper-luks\x2dbackup.target
> blockdev@dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target
>
> Where we list each of the volumes to be decyrpted as blocking the virtqemud
> service.
>
> Does anyone have any better alternatives?  My main issue it that it feels
> somewhere in between fine-grained and coarse-grained control.
>
> Ideally I think one would be able to have each individual VM startup
> automatically delayed until the devices each used became available, but I
> don't see how to do this.

I am not sure how libvirt works, but if it runs every VM in a systemd
unit, then you could just order the device before that unit, or the
unit after the device.

Really depends on how libvirt splits things up.

> Alternatively it seems like one should be able to delay all VM startup until
> all volumes in /etc/crypttab were unlocked, rather than having to specify each
> one.  But I don't see a target for that.

This is default behaviour. Anything listed in /etc/crypttab is ordered
before cryptsetup.target, which is ordered before sysinit.target,
which is ordered before basic.target, which is ordered before regular services.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Delaying VM startup until block devices are available

2024-01-25 Thread Andrei Borzenkov
On Fri, Jan 26, 2024 at 2:29 AM Orion Poplawski  wrote:
>
> We have various VMs that are back by luks encrypted LVs.  At boot the volumes
> are decrypted by clevis.  The problem we are seeing at the moment is that the
> VMs are started before the block devices are decrypted.  Our current solution 
> is:
>
> # cat /etc/systemd/system/virtqemud.service.d/override.conf
> [Unit]
> After=blockdev@dev-mapper-luks\x2dbackup.target
> blockdev@dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target
>

This only works if it is guaranteed that blockdev@xxx.target start job
is already queued when virtqemud.service start is requested. In
practice, systemd-cryptsetup is invoked early, before any "normal"
service so it appears to work. But to be on safe side you probably
need

Wants=systemd-cryptsetup@backup.service

or whatever service is used to decrypt device

> Where we list each of the volumes to be decyrpted as blocking the virtqemud
> service.
>
> Does anyone have any better alternatives?  My main issue it that it feels
> somewhere in between fine-grained and coarse-grained control.
>
> Ideally I think one would be able to have each individual VM startup
> automatically delayed until the devices each used became available, but I
> don't see how to do this.
>

Create a systemd generator that parses VM configuration(s) and adds
those requirements on startup.

> Alternatively it seems like one should be able to delay all VM startup until
> all volumes in /etc/crypttab were unlocked, rather than having to specify each
> one.  But I don't see a target for that.
>

As long as all entries in /etc/crypttab are auto and are not nofali,
they are ordered before /etc/crypttab which itself is ordered before
sysinit.target. So any normal service should start only after all
systemd-cryptsetup@xxx.service have completed. After=blockdev@... is
more relevant for shutdown, to ensure applications requiring this
block device will be shut down before systemd-cryptsetup@.service.

I do not know how clevis hooks into all of this. Does it use
systemd-cryptsetup@.service at all?


Re: [systemd-devel] Delaying VM startup until block devices are available

2024-01-25 Thread Mantas Mikulėnas
On Fri, Jan 26, 2024 at 1:29 AM Orion Poplawski  wrote:

> We have various VMs that are back by luks encrypted LVs.  At boot the
> volumes
> are decrypted by clevis.  The problem we are seeing at the moment is that
> the
> VMs are started before the block devices are decrypted.  Our current
> solution is:
>
> # cat /etc/systemd/system/virtqemud.service.d/override.conf
> [Unit]
> After=blockdev@dev-mapper-luks\x2dbackup.target
> blockdev@dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target
>
> Where we list each of the volumes to be decyrpted as blocking the virtqemud
> service.
>

> Does anyone have any better alternatives?  My main issue it that it feels
> somewhere in between fine-grained and coarse-grained control.
>
> Ideally I think one would be able to have each individual VM startup
> automatically delayed until the devices each used became available, but I
> don't see how to do this.
>

You can't really do this with systemd if it's not systemd that does the
startup... The libvirt daemons need to be patched to watch udev events and
wait for the devices they require.


>
> Alternatively it seems like one should be able to delay all VM startup
> until
> all volumes in /etc/crypttab were unlocked, rather than having to specify
> each
> one.  But I don't see a target for that.
>

If this were plain systemd-cryptsetup, you could add a drop-in for
"systemd-cryptsetup@.service" that adds Before=foo.target. I'm not sure if
clevis integrates with that. (Although honestly I don't see much point in
using clevis for data volumes at all – just use it for the rootfs, and
regular keyfiles in /etc/private for everything else...)

-- 
Mantas Mikulėnas


[systemd-devel] Delaying VM startup until block devices are available

2024-01-25 Thread Orion Poplawski
We have various VMs that are back by luks encrypted LVs.  At boot the volumes
are decrypted by clevis.  The problem we are seeing at the moment is that the
VMs are started before the block devices are decrypted.  Our current solution 
is:

# cat /etc/systemd/system/virtqemud.service.d/override.conf
[Unit]
After=blockdev@dev-mapper-luks\x2dbackup.target
blockdev@dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target

Where we list each of the volumes to be decyrpted as blocking the virtqemud
service.

Does anyone have any better alternatives?  My main issue it that it feels
somewhere in between fine-grained and coarse-grained control.

Ideally I think one would be able to have each individual VM startup
automatically delayed until the devices each used became available, but I
don't see how to do this.

Alternatively it seems like one should be able to delay all VM startup until
all volumes in /etc/crypttab were unlocked, rather than having to specify each
one.  But I don't see a target for that.

Thank you for your consideration,
  Orion

-- 
Orion Poplawski
he/him/his  - surely the least important thing about me
Manager of IT Systems  720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane   or...@nwra.com
Boulder, CO 80301 https://www.nwra.com/


smime.p7s
Description: S/MIME Cryptographic Signature