RE: IPv6 Compliance for networkd

2023-12-11 Thread Muggeridge, Matt



> -Original Message-
> From: Demi Marie Obenour 
> Sent: Tuesday, December 12, 2023 11:38 AM
> To: Muggeridge, Matt ; systemd-
> de...@lists.freedesktop.org
> Subject: Re: IPv6 Compliance for networkd
> 
> On Mon, Dec 11, 2023 at 10:52:31PM +, Muggeridge, Matt wrote:
> >
> >
> > > -Original Message-
> > > From: Demi Marie Obenour 
> > > Sent: Tuesday, December 12, 2023 7:14 AM
> > > To: Muggeridge, Matt ; systemd-
> > > de...@lists.freedesktop.org
> > > Subject: Re: IPv6 Compliance for networkd
> > >
> > > On Mon, Dec 11, 2023 at 07:14:27PM +, Muggeridge, Matt wrote:
> > > > Hello, networkd developer community,
> > > >
> > > > I am hoping to rally support for making networkd IPv6 compliant
> > > > and I'm will
> > > to help, but cannot do it alone. Is there any interest in making
> > > systemd- networkd IPv6 compliant?
> > > >
> > > > There are many organizations (especially US Government) that
> > > > mandate
> > > IPv6 compliance (USGv6).  Products that are dependent on networkd
> > > cannot be bid to these customers.
> > > >
> > > > How do I engage with the right people in the developer community?
> > > >
> > > > Thanks,
> > > > Matt.
> > > > PS: Mailing list topics go unanswered and github issues get lost
> > > > in the noise,
> > > so I'm hoping there's a more efficient way to collaborate.
> > >
> > > In what specific ways is networkd not compliant?
> > > --
> > > Sincerely,
> > > Demi Marie Obenour (she/her/hers)
> > > Invisible Things Lab
> >
> > Hi Demi,
> >
> > > In what specific ways is networkd not compliant?
> >
> > Refer to previous mailing list topics [1] and github issues, especially any
> issues opened by LiveFreeAndRoam [2].
> >
> > Are you a networkd developer?  Are you willing to collaborate on this?
> >
> > [1]
> > https://www.mail-archive.com/search?a=1=systemd-
> devel%40lists.freede
> >
> sktop.org=ipv6+compliance=0=0==
> in=1
> > d===relevance [2]
> >
> https://github.com/systemd/systemd/issues?q=is%3Aissue+author%3Alivefr
> > eeandroam
> 
> If you need these problems fixed so that you can use systemd-networkd in
> your commercial products, I recommend getting your company to pay
> developers to fix systemd-networkd.
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
> Invisible Things Lab

I appreciate that point, too. It seems prudent to request collaboration from 
others in the community, in case there is overlapping interest.

Thanks,
Matt.




Re: IPv6 Compliance for networkd

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 10:52:31PM +, Muggeridge, Matt wrote:
> 
> 
> > -Original Message-
> > From: Demi Marie Obenour 
> > Sent: Tuesday, December 12, 2023 7:14 AM
> > To: Muggeridge, Matt ; systemd-
> > de...@lists.freedesktop.org
> > Subject: Re: IPv6 Compliance for networkd
> > 
> > On Mon, Dec 11, 2023 at 07:14:27PM +, Muggeridge, Matt wrote:
> > > Hello, networkd developer community,
> > >
> > > I am hoping to rally support for making networkd IPv6 compliant and I'm 
> > > will
> > to help, but cannot do it alone. Is there any interest in making systemd-
> > networkd IPv6 compliant?
> > >
> > > There are many organizations (especially US Government) that mandate
> > IPv6 compliance (USGv6).  Products that are dependent on networkd cannot
> > be bid to these customers.
> > >
> > > How do I engage with the right people in the developer community?
> > >
> > > Thanks,
> > > Matt.
> > > PS: Mailing list topics go unanswered and github issues get lost in the 
> > > noise,
> > so I'm hoping there's a more efficient way to collaborate.
> > 
> > In what specific ways is networkd not compliant?
> > --
> > Sincerely,
> > Demi Marie Obenour (she/her/hers)
> > Invisible Things Lab
> 
> Hi Demi,
> 
> > In what specific ways is networkd not compliant?
> 
> Refer to previous mailing list topics [1] and github issues, especially any 
> issues opened by LiveFreeAndRoam [2].
> 
> Are you a networkd developer?  Are you willing to collaborate on this?
> 
> [1] 
> https://www.mail-archive.com/search?a=1=systemd-devel%40lists.freedesktop.org=ipv6+compliance=0=0===1d===relevance
> [2] 
> https://github.com/systemd/systemd/issues?q=is%3Aissue+author%3Alivefreeandroam

If you need these problems fixed so that you can use systemd-networkd in
your commercial products, I recommend getting your company to pay
developers to fix systemd-networkd.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


RE: IPv6 Compliance for networkd

2023-12-11 Thread Muggeridge, Matt



> -Original Message-
> From: Demi Marie Obenour 
> Sent: Tuesday, December 12, 2023 7:14 AM
> To: Muggeridge, Matt ; systemd-
> de...@lists.freedesktop.org
> Subject: Re: IPv6 Compliance for networkd
> 
> On Mon, Dec 11, 2023 at 07:14:27PM +, Muggeridge, Matt wrote:
> > Hello, networkd developer community,
> >
> > I am hoping to rally support for making networkd IPv6 compliant and I'm will
> to help, but cannot do it alone. Is there any interest in making systemd-
> networkd IPv6 compliant?
> >
> > There are many organizations (especially US Government) that mandate
> IPv6 compliance (USGv6).  Products that are dependent on networkd cannot
> be bid to these customers.
> >
> > How do I engage with the right people in the developer community?
> >
> > Thanks,
> > Matt.
> > PS: Mailing list topics go unanswered and github issues get lost in the 
> > noise,
> so I'm hoping there's a more efficient way to collaborate.
> 
> In what specific ways is networkd not compliant?
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
> Invisible Things Lab

Hi Demi,

> In what specific ways is networkd not compliant?

Refer to previous mailing list topics [1] and github issues, especially any 
issues opened by LiveFreeAndRoam [2].

Are you a networkd developer?  Are you willing to collaborate on this?

[1] 
https://www.mail-archive.com/search?a=1=systemd-devel%40lists.freedesktop.org=ipv6+compliance=0=0===1d===relevance
[2] 
https://github.com/systemd/systemd/issues?q=is%3Aissue+author%3Alivefreeandroam

Thanks,
Matt.


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Luca Boccassi
On Mon, 11 Dec 2023 at 21:20, Demi Marie Obenour
 wrote:
>
> On Mon, Dec 11, 2023 at 08:58:58PM +, Luca Boccassi wrote:
> > On Mon, 11 Dec 2023 at 20:43, Demi Marie Obenour
> >  wrote:
> > >
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA512
> > >
> > > On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> > > > On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
> > > >  wrote:
> > > > >
> > > > > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > > > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > > > > >
> > > > > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > > > > mini-initramfs contains just enough to get storage drivers loaded 
> > > > > > > and
> > > > > > > storage devices initialized. storage-init is a process that is not
> > > > > > > designed to replace init, it does just enough to initialize 
> > > > > > > storage
> > > > > > > (performs a targeted udev trigger on storage), switches to
> > > > > > > initoverlayfs as root and then executes init.
> > > > > > >
> > > > > > > ```
> > > > > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> 
> > > > > > > rootfs
> > > > > > >
> > > > > > > fw -> bootloader -> kernel -> storage-init   -> init 
> > > > > > > ->
> > > > > > > ```
> > > > > >
> > > > > > I am not sure I follow what these chains are supposed to mean? Why 
> > > > > > are
> > > > > > there two lines?
> > > > > >
> > > > > > So, I generally would agree that the current initrd scheme is not
> > > > > > ideal, and we have been discussing better approaches. But I am not
> > > > > > sure your approach really is useful on generic systems for two
> > > > > > reasons:
> > > > > >
> > > > > > 1. no security model? you need to authenticate your initrd in
> > > > > >2023. There's no execuse to not doing that anymore these days. 
> > > > > > Not
> > > > > >in automotive, and not anywhere else really.
> > > > > >
> > > > > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > > > > >unlock their root disks with TPM2 and similar things. People use
> > > > > >RAID, LVM, and all that mess.
> > > > > >
> > > > > > Actually the above are kinda the same problem in a way: you need
> > > > > > complex storage, but if you need that you kinda need udev, and
> > > > > > services, and then also systemd and all that other stuff, and that's
> > > > > > why the system works like the system works right now.
> > > > > >
> > > > > > Whenever you devise a system like yours by cutting corners, and
> > > > > > declaring that you don't want TPM, you don't want signed initrds, 
> > > > > > you
> > > > > > don't want to support weird storage, you just solve your problem in 
> > > > > > a
> > > > > > very specific way, ignoring the big picture. Which is OK, *if* you 
> > > > > > can
> > > > > > actually really work without all that and are willing to maintain 
> > > > > > the
> > > > > > solution for your specific problem only.
> > > > > >
> > > > > > As I understand you are trying to solve multiple problems at once
> > > > > > here, and I think one should start with figuring out clearly what
> > > > > > those are before trying to address them, maybe without compromising 
> > > > > > on
> > > > > > security. So my guess is you want to address the following:
> > > > > >
> > > > > > 1. You don't want the whole big initrd to be read off disk on every
> > > > > >boot, but only the parts of it that are actually needed.
> > > > > >
> > > > > > 2. You don't want the whole big initrd to be fully decompressed on 
> > > > > > every
> > > > > >boot, but only the parts of it that are actually needed.
> > > > > >
> > > > > > 3. You want to share data between root fs and initrd
> > > > > >
> > > > > > 4. You want to save some boot time by not bringing up an init system
> > > > > >in the initrd once, then tearing it down again, and starting it
> > > > > >again from the root fs.
> > > > > >
> > > > > > For the items listed above I think you can find different solutions
> > > > > > which do not necessarily compromise security as much.
> > > > > >
> > > > > > So, in the list above you could address the latter three like this:
> > > > > >
> > > > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > > > >loader load the erofs into contigous memory, then use memmap=X!Y 
> > > > > > on
> > > > > >the kernel cmdline to synthesize a block device from that, which
> > > > > >you then mount directly (without any initrd) via
> > > > > >root=/dev/pmem0. This means yout boot loader will still load the
> > > > > >whole image into memory, but only decompress the bits actually
> > > > > >neeed. (It also has some other nice benefits I like, such as an
> > > > > >immutable rootfs, which tmpfs-based initrds don't have.)
> > > > > >
> > > > > > 3. Simply never transition to the root fs, don't marke the initrds 
> > > > > > in
> > > > 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
On Mon, 11 Dec 2023 at 20:59, Luca Boccassi  wrote:
>
> On Mon, 11 Dec 2023 at 20:43, Demi Marie Obenour
>  wrote:
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> > > On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
> > >  wrote:
> > > >
> > > > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > > > >
> > > > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > > > mini-initramfs contains just enough to get storage drivers loaded 
> > > > > > and
> > > > > > storage devices initialized. storage-init is a process that is not
> > > > > > designed to replace init, it does just enough to initialize storage
> > > > > > (performs a targeted udev trigger on storage), switches to
> > > > > > initoverlayfs as root and then executes init.
> > > > > >
> > > > > > ```
> > > > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> 
> > > > > > rootfs
> > > > > >
> > > > > > fw -> bootloader -> kernel -> storage-init   -> init 
> > > > > > ->
> > > > > > ```
> > > > >
> > > > > I am not sure I follow what these chains are supposed to mean? Why are
> > > > > there two lines?
> > > > >
> > > > > So, I generally would agree that the current initrd scheme is not
> > > > > ideal, and we have been discussing better approaches. But I am not
> > > > > sure your approach really is useful on generic systems for two
> > > > > reasons:
> > > > >
> > > > > 1. no security model? you need to authenticate your initrd in
> > > > >2023. There's no execuse to not doing that anymore these days. Not
> > > > >in automotive, and not anywhere else really.
> > > > >
> > > > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > > > >unlock their root disks with TPM2 and similar things. People use
> > > > >RAID, LVM, and all that mess.
> > > > >
> > > > > Actually the above are kinda the same problem in a way: you need
> > > > > complex storage, but if you need that you kinda need udev, and
> > > > > services, and then also systemd and all that other stuff, and that's
> > > > > why the system works like the system works right now.
> > > > >
> > > > > Whenever you devise a system like yours by cutting corners, and
> > > > > declaring that you don't want TPM, you don't want signed initrds, you
> > > > > don't want to support weird storage, you just solve your problem in a
> > > > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > > > actually really work without all that and are willing to maintain the
> > > > > solution for your specific problem only.
> > > > >
> > > > > As I understand you are trying to solve multiple problems at once
> > > > > here, and I think one should start with figuring out clearly what
> > > > > those are before trying to address them, maybe without compromising on
> > > > > security. So my guess is you want to address the following:
> > > > >
> > > > > 1. You don't want the whole big initrd to be read off disk on every
> > > > >boot, but only the parts of it that are actually needed.
> > > > >
> > > > > 2. You don't want the whole big initrd to be fully decompressed on 
> > > > > every
> > > > >boot, but only the parts of it that are actually needed.
> > > > >
> > > > > 3. You want to share data between root fs and initrd
> > > > >
> > > > > 4. You want to save some boot time by not bringing up an init system
> > > > >in the initrd once, then tearing it down again, and starting it
> > > > >again from the root fs.
> > > > >
> > > > > For the items listed above I think you can find different solutions
> > > > > which do not necessarily compromise security as much.
> > > > >
> > > > > So, in the list above you could address the latter three like this:
> > > > >
> > > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > > > >the kernel cmdline to synthesize a block device from that, which
> > > > >you then mount directly (without any initrd) via
> > > > >root=/dev/pmem0. This means yout boot loader will still load the
> > > > >whole image into memory, but only decompress the bits actually
> > > > >neeed. (It also has some other nice benefits I like, such as an
> > > > >immutable rootfs, which tmpfs-based initrds don't have.)
> > > > >
> > > > > 3. Simply never transition to the root fs, don't marke the initrds in
> > > > >systemd's eyes as an initrd (specifically: don't add an
> > > > >/etc/initrd-release file to it). Instead, just merge resources of
> > > > >the root fs into your initrd fs via overlayfs. systemd has
> > > > >infrastructure for this: "systemd-sysext". It takes immutable,
> > > > >authenticated erofs images (with verity, we call them "DDIs",
> > > > >i.e. 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 08:58:58PM +, Luca Boccassi wrote:
> On Mon, 11 Dec 2023 at 20:43, Demi Marie Obenour
>  wrote:
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> > > On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
> > >  wrote:
> > > >
> > > > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > > > >
> > > > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > > > mini-initramfs contains just enough to get storage drivers loaded 
> > > > > > and
> > > > > > storage devices initialized. storage-init is a process that is not
> > > > > > designed to replace init, it does just enough to initialize storage
> > > > > > (performs a targeted udev trigger on storage), switches to
> > > > > > initoverlayfs as root and then executes init.
> > > > > >
> > > > > > ```
> > > > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> 
> > > > > > rootfs
> > > > > >
> > > > > > fw -> bootloader -> kernel -> storage-init   -> init 
> > > > > > ->
> > > > > > ```
> > > > >
> > > > > I am not sure I follow what these chains are supposed to mean? Why are
> > > > > there two lines?
> > > > >
> > > > > So, I generally would agree that the current initrd scheme is not
> > > > > ideal, and we have been discussing better approaches. But I am not
> > > > > sure your approach really is useful on generic systems for two
> > > > > reasons:
> > > > >
> > > > > 1. no security model? you need to authenticate your initrd in
> > > > >2023. There's no execuse to not doing that anymore these days. Not
> > > > >in automotive, and not anywhere else really.
> > > > >
> > > > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > > > >unlock their root disks with TPM2 and similar things. People use
> > > > >RAID, LVM, and all that mess.
> > > > >
> > > > > Actually the above are kinda the same problem in a way: you need
> > > > > complex storage, but if you need that you kinda need udev, and
> > > > > services, and then also systemd and all that other stuff, and that's
> > > > > why the system works like the system works right now.
> > > > >
> > > > > Whenever you devise a system like yours by cutting corners, and
> > > > > declaring that you don't want TPM, you don't want signed initrds, you
> > > > > don't want to support weird storage, you just solve your problem in a
> > > > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > > > actually really work without all that and are willing to maintain the
> > > > > solution for your specific problem only.
> > > > >
> > > > > As I understand you are trying to solve multiple problems at once
> > > > > here, and I think one should start with figuring out clearly what
> > > > > those are before trying to address them, maybe without compromising on
> > > > > security. So my guess is you want to address the following:
> > > > >
> > > > > 1. You don't want the whole big initrd to be read off disk on every
> > > > >boot, but only the parts of it that are actually needed.
> > > > >
> > > > > 2. You don't want the whole big initrd to be fully decompressed on 
> > > > > every
> > > > >boot, but only the parts of it that are actually needed.
> > > > >
> > > > > 3. You want to share data between root fs and initrd
> > > > >
> > > > > 4. You want to save some boot time by not bringing up an init system
> > > > >in the initrd once, then tearing it down again, and starting it
> > > > >again from the root fs.
> > > > >
> > > > > For the items listed above I think you can find different solutions
> > > > > which do not necessarily compromise security as much.
> > > > >
> > > > > So, in the list above you could address the latter three like this:
> > > > >
> > > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > > > >the kernel cmdline to synthesize a block device from that, which
> > > > >you then mount directly (without any initrd) via
> > > > >root=/dev/pmem0. This means yout boot loader will still load the
> > > > >whole image into memory, but only decompress the bits actually
> > > > >neeed. (It also has some other nice benefits I like, such as an
> > > > >immutable rootfs, which tmpfs-based initrds don't have.)
> > > > >
> > > > > 3. Simply never transition to the root fs, don't marke the initrds in
> > > > >systemd's eyes as an initrd (specifically: don't add an
> > > > >/etc/initrd-release file to it). Instead, just merge resources of
> > > > >the root fs into your initrd fs via overlayfs. systemd has
> > > > >infrastructure for this: "systemd-sysext". It takes immutable,
> > > > >authenticated erofs images (with verity, we call them "DDIs",
> > > > >  

Re: IPv6 Compliance for networkd

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 07:14:27PM +, Muggeridge, Matt wrote:
> Hello, networkd developer community,
> 
> I am hoping to rally support for making networkd IPv6 compliant and I'm will 
> to help, but cannot do it alone. Is there any interest in making 
> systemd-networkd IPv6 compliant?
> 
> There are many organizations (especially US Government) that mandate IPv6 
> compliance (USGv6).  Products that are dependent on networkd cannot be bid to 
> these customers.
> 
> How do I engage with the right people in the developer community?
> 
> Thanks,
> Matt.
> PS: Mailing list topics go unanswered and github issues get lost in the 
> noise, so I'm hoping there's a more efficient way to collaborate.

In what specific ways is networkd not compliant?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Luca Boccassi
On Mon, 11 Dec 2023 at 20:43, Demi Marie Obenour
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> > On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
> >  wrote:
> > >
> > > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > > >
> > > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > > storage devices initialized. storage-init is a process that is not
> > > > > designed to replace init, it does just enough to initialize storage
> > > > > (performs a targeted udev trigger on storage), switches to
> > > > > initoverlayfs as root and then executes init.
> > > > >
> > > > > ```
> > > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> 
> > > > > rootfs
> > > > >
> > > > > fw -> bootloader -> kernel -> storage-init   -> init 
> > > > > ->
> > > > > ```
> > > >
> > > > I am not sure I follow what these chains are supposed to mean? Why are
> > > > there two lines?
> > > >
> > > > So, I generally would agree that the current initrd scheme is not
> > > > ideal, and we have been discussing better approaches. But I am not
> > > > sure your approach really is useful on generic systems for two
> > > > reasons:
> > > >
> > > > 1. no security model? you need to authenticate your initrd in
> > > >2023. There's no execuse to not doing that anymore these days. Not
> > > >in automotive, and not anywhere else really.
> > > >
> > > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > > >unlock their root disks with TPM2 and similar things. People use
> > > >RAID, LVM, and all that mess.
> > > >
> > > > Actually the above are kinda the same problem in a way: you need
> > > > complex storage, but if you need that you kinda need udev, and
> > > > services, and then also systemd and all that other stuff, and that's
> > > > why the system works like the system works right now.
> > > >
> > > > Whenever you devise a system like yours by cutting corners, and
> > > > declaring that you don't want TPM, you don't want signed initrds, you
> > > > don't want to support weird storage, you just solve your problem in a
> > > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > > actually really work without all that and are willing to maintain the
> > > > solution for your specific problem only.
> > > >
> > > > As I understand you are trying to solve multiple problems at once
> > > > here, and I think one should start with figuring out clearly what
> > > > those are before trying to address them, maybe without compromising on
> > > > security. So my guess is you want to address the following:
> > > >
> > > > 1. You don't want the whole big initrd to be read off disk on every
> > > >boot, but only the parts of it that are actually needed.
> > > >
> > > > 2. You don't want the whole big initrd to be fully decompressed on every
> > > >boot, but only the parts of it that are actually needed.
> > > >
> > > > 3. You want to share data between root fs and initrd
> > > >
> > > > 4. You want to save some boot time by not bringing up an init system
> > > >in the initrd once, then tearing it down again, and starting it
> > > >again from the root fs.
> > > >
> > > > For the items listed above I think you can find different solutions
> > > > which do not necessarily compromise security as much.
> > > >
> > > > So, in the list above you could address the latter three like this:
> > > >
> > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > > >the kernel cmdline to synthesize a block device from that, which
> > > >you then mount directly (without any initrd) via
> > > >root=/dev/pmem0. This means yout boot loader will still load the
> > > >whole image into memory, but only decompress the bits actually
> > > >neeed. (It also has some other nice benefits I like, such as an
> > > >immutable rootfs, which tmpfs-based initrds don't have.)
> > > >
> > > > 3. Simply never transition to the root fs, don't marke the initrds in
> > > >systemd's eyes as an initrd (specifically: don't add an
> > > >/etc/initrd-release file to it). Instead, just merge resources of
> > > >the root fs into your initrd fs via overlayfs. systemd has
> > > >infrastructure for this: "systemd-sysext". It takes immutable,
> > > >authenticated erofs images (with verity, we call them "DDIs",
> > > >i.e. "discoverable disk images") that it overlays into /usr/. [You
> > > >could also very nicely combine this approach with systemd's
> > > >portable services, and npsawn containers, which operate on the same
> > > >authenticated images]. At MSFT we have a 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
>  wrote:
> >
> > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > >
> > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > storage devices initialized. storage-init is a process that is not
> > > > designed to replace init, it does just enough to initialize storage
> > > > (performs a targeted udev trigger on storage), switches to
> > > > initoverlayfs as root and then executes init.
> > > >
> > > > ```
> > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > > >
> > > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > > ```
> > >
> > > I am not sure I follow what these chains are supposed to mean? Why are
> > > there two lines?
> > >
> > > So, I generally would agree that the current initrd scheme is not
> > > ideal, and we have been discussing better approaches. But I am not
> > > sure your approach really is useful on generic systems for two
> > > reasons:
> > >
> > > 1. no security model? you need to authenticate your initrd in
> > >2023. There's no execuse to not doing that anymore these days. Not
> > >in automotive, and not anywhere else really.
> > >
> > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > >unlock their root disks with TPM2 and similar things. People use
> > >RAID, LVM, and all that mess.
> > >
> > > Actually the above are kinda the same problem in a way: you need
> > > complex storage, but if you need that you kinda need udev, and
> > > services, and then also systemd and all that other stuff, and that's
> > > why the system works like the system works right now.
> > >
> > > Whenever you devise a system like yours by cutting corners, and
> > > declaring that you don't want TPM, you don't want signed initrds, you
> > > don't want to support weird storage, you just solve your problem in a
> > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > actually really work without all that and are willing to maintain the
> > > solution for your specific problem only.
> > >
> > > As I understand you are trying to solve multiple problems at once
> > > here, and I think one should start with figuring out clearly what
> > > those are before trying to address them, maybe without compromising on
> > > security. So my guess is you want to address the following:
> > >
> > > 1. You don't want the whole big initrd to be read off disk on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 2. You don't want the whole big initrd to be fully decompressed on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 3. You want to share data between root fs and initrd
> > >
> > > 4. You want to save some boot time by not bringing up an init system
> > >in the initrd once, then tearing it down again, and starting it
> > >again from the root fs.
> > >
> > > For the items listed above I think you can find different solutions
> > > which do not necessarily compromise security as much.
> > >
> > > So, in the list above you could address the latter three like this:
> > >
> > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > >the kernel cmdline to synthesize a block device from that, which
> > >you then mount directly (without any initrd) via
> > >root=/dev/pmem0. This means yout boot loader will still load the
> > >whole image into memory, but only decompress the bits actually
> > >neeed. (It also has some other nice benefits I like, such as an
> > >immutable rootfs, which tmpfs-based initrds don't have.)
> > >
> > > 3. Simply never transition to the root fs, don't marke the initrds in
> > >systemd's eyes as an initrd (specifically: don't add an
> > >/etc/initrd-release file to it). Instead, just merge resources of
> > >the root fs into your initrd fs via overlayfs. systemd has
> > >infrastructure for this: "systemd-sysext". It takes immutable,
> > >authenticated erofs images (with verity, we call them "DDIs",
> > >i.e. "discoverable disk images") that it overlays into /usr/. [You
> > >could also very nicely combine this approach with systemd's
> > >portable services, and npsawn containers, which operate on the same
> > >authenticated images]. At MSFT we have a major product that works
> > >exactly like this: the OS runs off a rootfs that is loaded as an
> > >initrd, and everything that runs on top of this are just these
> > >verity disk images, using overlayfs and portable services.
> > >
> > > 4. The proposal 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Luca Boccassi
On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
 wrote:
>
> On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> >
> > > Here is the boot sequence with initoverlayfs integrated, the
> > > mini-initramfs contains just enough to get storage drivers loaded and
> > > storage devices initialized. storage-init is a process that is not
> > > designed to replace init, it does just enough to initialize storage
> > > (performs a targeted udev trigger on storage), switches to
> > > initoverlayfs as root and then executes init.
> > >
> > > ```
> > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > >
> > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > ```
> >
> > I am not sure I follow what these chains are supposed to mean? Why are
> > there two lines?
> >
> > So, I generally would agree that the current initrd scheme is not
> > ideal, and we have been discussing better approaches. But I am not
> > sure your approach really is useful on generic systems for two
> > reasons:
> >
> > 1. no security model? you need to authenticate your initrd in
> >2023. There's no execuse to not doing that anymore these days. Not
> >in automotive, and not anywhere else really.
> >
> > 2. no way to deal with complex storage? i.e. people use FDE, want to
> >unlock their root disks with TPM2 and similar things. People use
> >RAID, LVM, and all that mess.
> >
> > Actually the above are kinda the same problem in a way: you need
> > complex storage, but if you need that you kinda need udev, and
> > services, and then also systemd and all that other stuff, and that's
> > why the system works like the system works right now.
> >
> > Whenever you devise a system like yours by cutting corners, and
> > declaring that you don't want TPM, you don't want signed initrds, you
> > don't want to support weird storage, you just solve your problem in a
> > very specific way, ignoring the big picture. Which is OK, *if* you can
> > actually really work without all that and are willing to maintain the
> > solution for your specific problem only.
> >
> > As I understand you are trying to solve multiple problems at once
> > here, and I think one should start with figuring out clearly what
> > those are before trying to address them, maybe without compromising on
> > security. So my guess is you want to address the following:
> >
> > 1. You don't want the whole big initrd to be read off disk on every
> >boot, but only the parts of it that are actually needed.
> >
> > 2. You don't want the whole big initrd to be fully decompressed on every
> >boot, but only the parts of it that are actually needed.
> >
> > 3. You want to share data between root fs and initrd
> >
> > 4. You want to save some boot time by not bringing up an init system
> >in the initrd once, then tearing it down again, and starting it
> >again from the root fs.
> >
> > For the items listed above I think you can find different solutions
> > which do not necessarily compromise security as much.
> >
> > So, in the list above you could address the latter three like this:
> >
> > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> >loader load the erofs into contigous memory, then use memmap=X!Y on
> >the kernel cmdline to synthesize a block device from that, which
> >you then mount directly (without any initrd) via
> >root=/dev/pmem0. This means yout boot loader will still load the
> >whole image into memory, but only decompress the bits actually
> >neeed. (It also has some other nice benefits I like, such as an
> >immutable rootfs, which tmpfs-based initrds don't have.)
> >
> > 3. Simply never transition to the root fs, don't marke the initrds in
> >systemd's eyes as an initrd (specifically: don't add an
> >/etc/initrd-release file to it). Instead, just merge resources of
> >the root fs into your initrd fs via overlayfs. systemd has
> >infrastructure for this: "systemd-sysext". It takes immutable,
> >authenticated erofs images (with verity, we call them "DDIs",
> >i.e. "discoverable disk images") that it overlays into /usr/. [You
> >could also very nicely combine this approach with systemd's
> >portable services, and npsawn containers, which operate on the same
> >authenticated images]. At MSFT we have a major product that works
> >exactly like this: the OS runs off a rootfs that is loaded as an
> >initrd, and everything that runs on top of this are just these
> >verity disk images, using overlayfs and portable services.
> >
> > 4. The proposal in 3 also addresses goal 4.
> >
> > Which leaves item 1, which is a bit harder to address. We have been
> > discussing this off an on internally too. A generic solution to this
> > is hard. My current thinking for this could be something like this,
> > covering the UEFI world: support sticking a 

IPv6 Compliance for networkd

2023-12-11 Thread Muggeridge, Matt
Hello, networkd developer community,

I am hoping to rally support for making networkd IPv6 compliant and I'm will to 
help, but cannot do it alone. Is there any interest in making systemd-networkd 
IPv6 compliant?

There are many organizations (especially US Government) that mandate IPv6 
compliance (USGv6).  Products that are dependent on networkd cannot be bid to 
these customers.

How do I engage with the right people in the developer community?

Thanks,
Matt.
PS: Mailing list topics go unanswered and github issues get lost in the noise, 
so I'm hoping there's a more efficient way to collaborate.


Re: [systemd-devel] Manual start of user@.service failed with permission denied

2023-12-11 Thread Andrei Borzenkov

On 11.12.2023 18:28, Christopher Wong wrote:

Hi Mantas,

I have added ExecStartPre to user@.service to run “id” 
and “ls -la”:

Dec 11 15:50:34 host systemd-user-runtime-dir[40287]: Will mount /run/user/1001 
owned by 1001:118
Dec 11 15:50:34 host systemd-user-runtime-dir[40287]: Mounting tmpfs (tmpfs) on 
/run/user/1001 (MS_NOSUID|MS_NODEV 
"mode=0700,uid=1001,gid=118,size=99426304,nr_inodes=24274")...
Dec 11 15:50:34 host systemd[1]: Finished User Runtime Directory /run/user/1001.
Dec 11 15:50:34 host systemd[1]: Starting User Manager for UID 1001...
Dec 11 15:50:34 host id[40291]: uid=1001(ida) gid=118(ssh-users) 
groups=118(ssh-users),236(systemd-journal)
Dec 11 15:50:34 host ls[40293]: drwxr-xr-x3 root root60 Dec 
11 15:50 .
Dec 11 15:50:34 host ls[40293]: drwxr-xr-x   98 root root  2120 Dec 
11 15:30 ..
Dec 11 15:50:34 host ls[40293]: drwx--2 root root40 Dec 
11 15:50 1001
Dec 11 15:50:34 host systemd[40294]: systemd 254.7-2-g9edc143 running in user 
mode for user 1001/ida. (-PAM -AUDIT -SELINUX -APPARMOR +IMA -SMACK +SECCOMP 
+GCRYPT +GNUTLS +OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 -IDN2 -IDN -IPTC 
+KMOD -LIBCRYPTSETUP +LIBFDISK -PCRE2 -PWQUALITY -P11KIT -QRENCODE -TPM2 +BZIP2 
-LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON -UTMP -SYSVINIT 
default-hierarchy=unified)

The /run/user/1001 belongs to root with mode 0700. Should this belong to root? 


No.


Is it because I manually start user@1001.service as 
root?


No.


However, after user-runtime-dir@1001.service has 
finished it startup,  the user@1001.service is started as 
uid=1001 and therefore can’t create any directories under /run/user/1001. Resulting in 
user@1001.service failed to start.

If I add “ExecStartPre=+chown %i /run/user/%i” to 
user@.service then it works! But I am unsure if this is 
really the way fix this.


As clearly seen from logs, systemd-user-runtime-dir mounts tmpfs with 
option uid=1001 over /run/user/1001. Is it still a mounted filesystem 
when you check it? It sounds like you see mount point which indeed has 
permissions 700 and owner root, not mounted filesystem.




Regarding the testing, I have done both restart of everything and manual, but 
the result is the same. Now that I have the 
“Environment=XDG_RUNTIME_DIR=/run/user/%i” I no longer need to do “systemctl 
set-environment …”

Thank you for taking your time!

Best regards,
Christopher Wong


From: Mantas Mikulėnas 
Date: Friday, 8 December 2023 at 21:53
To: Christopher Wong 
Cc: Systemd 
Subject: Re: [systemd-devel] Manual start of user@.service failed with 
permission denied
On Fri, Dec 8, 2023 at 6:53 PM Christopher Wong 
mailto:christopher.w...@axis.com>> wrote:
Hi Mantas,

I have from your suggestion done the following:

Putting the below in user@.service

[Service]
...
Environment=XDG_RUNTIME_DIR=/run/user/%i
Environment=SYSTEMD_LOG_LEVEL=debug

Putting the below in user-runtime-dir@.service

[Service]
...
Environment=SYSTEMD_LOG_LEVEL=debug

Then I have disabled the global set-log-level debug (if this is also required, 
please let me know).

Unlike set-environment that's not global, it only affects pid1.


What I can see from the logs is that 
user-runtime-dir@1001.service seems to be 
started and mount /run/user/1001, but addition creation of directory under this mount 
is getting permission denied.

Dec 08 17:33:29 host systemd[1]: Created slice User Slice of UID 1001.
Dec 08 17:33:29 host systemd[1]: Starting User Runtime Directory 
/run/user/1001...
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
UNSET -> OPENING
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: sd-bus: starting bus by 
connecting to /run/dbus/system_bus_socket...
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
OPENING -> AUTHENTICATING
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
AUTHENTICATING -> HELLO
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Sent message 
type=method_call sender=n/a destination=org.freedesktop.DBus 
path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 
reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Got message 
type=method_return sender=org.freedesktop.DBus destination=:1.2536 path=n/a 
interface=n/a member=n/a  cookie=1 reply_cookie=1 signature=s error-name=n/a 
error-message=n/a
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
HELLO -> RUNNING
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Sent message 
type=method_call sender=n/a destination=org.freedesktop.login1 
path=/org/freedesktop/login1 interface=org.freedesktop.DBus.Properties 
member=Get cookie=2 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On Mon, Dec 11, 2023 at 05:03:13PM +, Eric Curtin wrote:
> On Mon, 11 Dec 2023 at 16:36, Demi Marie Obenour
>  wrote:
> >
> > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > >
> > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > storage devices initialized. storage-init is a process that is not
> > > > designed to replace init, it does just enough to initialize storage
> > > > (performs a targeted udev trigger on storage), switches to
> > > > initoverlayfs as root and then executes init.
> > > >
> > > > ```
> > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > > >
> > > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > > ```
> > >
> > > I am not sure I follow what these chains are supposed to mean? Why are
> > > there two lines?
> > >
> > > So, I generally would agree that the current initrd scheme is not
> > > ideal, and we have been discussing better approaches. But I am not
> > > sure your approach really is useful on generic systems for two
> > > reasons:
> > >
> > > 1. no security model? you need to authenticate your initrd in
> > >2023. There's no execuse to not doing that anymore these days. Not
> > >in automotive, and not anywhere else really.
> > >
> > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > >unlock their root disks with TPM2 and similar things. People use
> > >RAID, LVM, and all that mess.
> > >
> > > Actually the above are kinda the same problem in a way: you need
> > > complex storage, but if you need that you kinda need udev, and
> > > services, and then also systemd and all that other stuff, and that's
> > > why the system works like the system works right now.
> > >
> > > Whenever you devise a system like yours by cutting corners, and
> > > declaring that you don't want TPM, you don't want signed initrds, you
> > > don't want to support weird storage, you just solve your problem in a
> > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > actually really work without all that and are willing to maintain the
> > > solution for your specific problem only.
> > >
> > > As I understand you are trying to solve multiple problems at once
> > > here, and I think one should start with figuring out clearly what
> > > those are before trying to address them, maybe without compromising on
> > > security. So my guess is you want to address the following:
> > >
> > > 1. You don't want the whole big initrd to be read off disk on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 2. You don't want the whole big initrd to be fully decompressed on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 3. You want to share data between root fs and initrd
> > >
> > > 4. You want to save some boot time by not bringing up an init system
> > >in the initrd once, then tearing it down again, and starting it
> > >again from the root fs.
> > >
> > > For the items listed above I think you can find different solutions
> > > which do not necessarily compromise security as much.
> > >
> > > So, in the list above you could address the latter three like this:
> > >
> > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > >the kernel cmdline to synthesize a block device from that, which
> > >you then mount directly (without any initrd) via
> > >root=/dev/pmem0. This means yout boot loader will still load the
> > >whole image into memory, but only decompress the bits actually
> > >neeed. (It also has some other nice benefits I like, such as an
> > >immutable rootfs, which tmpfs-based initrds don't have.)
> > >
> > > 3. Simply never transition to the root fs, don't marke the initrds in
> > >systemd's eyes as an initrd (specifically: don't add an
> > >/etc/initrd-release file to it). Instead, just merge resources of
> > >the root fs into your initrd fs via overlayfs. systemd has
> > >infrastructure for this: "systemd-sysext". It takes immutable,
> > >authenticated erofs images (with verity, we call them "DDIs",
> > >i.e. "discoverable disk images") that it overlays into /usr/. [You
> > >could also very nicely combine this approach with systemd's
> > >portable services, and npsawn containers, which operate on the same
> > >authenticated images]. At MSFT we have a major product that works
> > >exactly like this: the OS runs off a rootfs that is loaded as an
> > >initrd, and everything that runs on top of this are just these
> > >verity disk images, using overlayfs and portable services.
> > >
> > > 4. The proposal in 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Neal Gompa
On Mon, Dec 11, 2023 at 12:30 PM Demi Marie Obenour
 wrote:
>
> On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> >
> > > Here is the boot sequence with initoverlayfs integrated, the
> > > mini-initramfs contains just enough to get storage drivers loaded and
> > > storage devices initialized. storage-init is a process that is not
> > > designed to replace init, it does just enough to initialize storage
> > > (performs a targeted udev trigger on storage), switches to
> > > initoverlayfs as root and then executes init.
> > >
> > > ```
> > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > >
> > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > ```
> >
> > I am not sure I follow what these chains are supposed to mean? Why are
> > there two lines?
> >
> > So, I generally would agree that the current initrd scheme is not
> > ideal, and we have been discussing better approaches. But I am not
> > sure your approach really is useful on generic systems for two
> > reasons:
> >
> > 1. no security model? you need to authenticate your initrd in
> >2023. There's no execuse to not doing that anymore these days. Not
> >in automotive, and not anywhere else really.
> >
> > 2. no way to deal with complex storage? i.e. people use FDE, want to
> >unlock their root disks with TPM2 and similar things. People use
> >RAID, LVM, and all that mess.
> >
> > Actually the above are kinda the same problem in a way: you need
> > complex storage, but if you need that you kinda need udev, and
> > services, and then also systemd and all that other stuff, and that's
> > why the system works like the system works right now.
> >
> > Whenever you devise a system like yours by cutting corners, and
> > declaring that you don't want TPM, you don't want signed initrds, you
> > don't want to support weird storage, you just solve your problem in a
> > very specific way, ignoring the big picture. Which is OK, *if* you can
> > actually really work without all that and are willing to maintain the
> > solution for your specific problem only.
> >
> > As I understand you are trying to solve multiple problems at once
> > here, and I think one should start with figuring out clearly what
> > those are before trying to address them, maybe without compromising on
> > security. So my guess is you want to address the following:
> >
> > 1. You don't want the whole big initrd to be read off disk on every
> >boot, but only the parts of it that are actually needed.
> >
> > 2. You don't want the whole big initrd to be fully decompressed on every
> >boot, but only the parts of it that are actually needed.
> >
> > 3. You want to share data between root fs and initrd
> >
> > 4. You want to save some boot time by not bringing up an init system
> >in the initrd once, then tearing it down again, and starting it
> >again from the root fs.
> >
> > For the items listed above I think you can find different solutions
> > which do not necessarily compromise security as much.
> >
> > So, in the list above you could address the latter three like this:
> >
> > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> >loader load the erofs into contigous memory, then use memmap=X!Y on
> >the kernel cmdline to synthesize a block device from that, which
> >you then mount directly (without any initrd) via
> >root=/dev/pmem0. This means yout boot loader will still load the
> >whole image into memory, but only decompress the bits actually
> >neeed. (It also has some other nice benefits I like, such as an
> >immutable rootfs, which tmpfs-based initrds don't have.)
> >
> > 3. Simply never transition to the root fs, don't marke the initrds in
> >systemd's eyes as an initrd (specifically: don't add an
> >/etc/initrd-release file to it). Instead, just merge resources of
> >the root fs into your initrd fs via overlayfs. systemd has
> >infrastructure for this: "systemd-sysext". It takes immutable,
> >authenticated erofs images (with verity, we call them "DDIs",
> >i.e. "discoverable disk images") that it overlays into /usr/. [You
> >could also very nicely combine this approach with systemd's
> >portable services, and npsawn containers, which operate on the same
> >authenticated images]. At MSFT we have a major product that works
> >exactly like this: the OS runs off a rootfs that is loaded as an
> >initrd, and everything that runs on top of this are just these
> >verity disk images, using overlayfs and portable services.
> >
> > 4. The proposal in 3 also addresses goal 4.
> >
> > Which leaves item 1, which is a bit harder to address. We have been
> > discussing this off an on internally too. A generic solution to this
> > is hard. My current thinking for this could be something like this,
> > covering the UEFI world: support sticking 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
On Mon, 11 Dec 2023 at 16:36, Demi Marie Obenour
 wrote:
>
> On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> >
> > > Here is the boot sequence with initoverlayfs integrated, the
> > > mini-initramfs contains just enough to get storage drivers loaded and
> > > storage devices initialized. storage-init is a process that is not
> > > designed to replace init, it does just enough to initialize storage
> > > (performs a targeted udev trigger on storage), switches to
> > > initoverlayfs as root and then executes init.
> > >
> > > ```
> > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > >
> > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > ```
> >
> > I am not sure I follow what these chains are supposed to mean? Why are
> > there two lines?
> >
> > So, I generally would agree that the current initrd scheme is not
> > ideal, and we have been discussing better approaches. But I am not
> > sure your approach really is useful on generic systems for two
> > reasons:
> >
> > 1. no security model? you need to authenticate your initrd in
> >2023. There's no execuse to not doing that anymore these days. Not
> >in automotive, and not anywhere else really.
> >
> > 2. no way to deal with complex storage? i.e. people use FDE, want to
> >unlock their root disks with TPM2 and similar things. People use
> >RAID, LVM, and all that mess.
> >
> > Actually the above are kinda the same problem in a way: you need
> > complex storage, but if you need that you kinda need udev, and
> > services, and then also systemd and all that other stuff, and that's
> > why the system works like the system works right now.
> >
> > Whenever you devise a system like yours by cutting corners, and
> > declaring that you don't want TPM, you don't want signed initrds, you
> > don't want to support weird storage, you just solve your problem in a
> > very specific way, ignoring the big picture. Which is OK, *if* you can
> > actually really work without all that and are willing to maintain the
> > solution for your specific problem only.
> >
> > As I understand you are trying to solve multiple problems at once
> > here, and I think one should start with figuring out clearly what
> > those are before trying to address them, maybe without compromising on
> > security. So my guess is you want to address the following:
> >
> > 1. You don't want the whole big initrd to be read off disk on every
> >boot, but only the parts of it that are actually needed.
> >
> > 2. You don't want the whole big initrd to be fully decompressed on every
> >boot, but only the parts of it that are actually needed.
> >
> > 3. You want to share data between root fs and initrd
> >
> > 4. You want to save some boot time by not bringing up an init system
> >in the initrd once, then tearing it down again, and starting it
> >again from the root fs.
> >
> > For the items listed above I think you can find different solutions
> > which do not necessarily compromise security as much.
> >
> > So, in the list above you could address the latter three like this:
> >
> > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> >loader load the erofs into contigous memory, then use memmap=X!Y on
> >the kernel cmdline to synthesize a block device from that, which
> >you then mount directly (without any initrd) via
> >root=/dev/pmem0. This means yout boot loader will still load the
> >whole image into memory, but only decompress the bits actually
> >neeed. (It also has some other nice benefits I like, such as an
> >immutable rootfs, which tmpfs-based initrds don't have.)
> >
> > 3. Simply never transition to the root fs, don't marke the initrds in
> >systemd's eyes as an initrd (specifically: don't add an
> >/etc/initrd-release file to it). Instead, just merge resources of
> >the root fs into your initrd fs via overlayfs. systemd has
> >infrastructure for this: "systemd-sysext". It takes immutable,
> >authenticated erofs images (with verity, we call them "DDIs",
> >i.e. "discoverable disk images") that it overlays into /usr/. [You
> >could also very nicely combine this approach with systemd's
> >portable services, and npsawn containers, which operate on the same
> >authenticated images]. At MSFT we have a major product that works
> >exactly like this: the OS runs off a rootfs that is loaded as an
> >initrd, and everything that runs on top of this are just these
> >verity disk images, using overlayfs and portable services.
> >
> > 4. The proposal in 3 also addresses goal 4.
> >
> > Which leaves item 1, which is a bit harder to address. We have been
> > discussing this off an on internally too. A generic solution to this
> > is hard. My current thinking for this could be something like this,
> > covering the UEFI world: support sticking a 

Re: [systemd-devel] Manual start of user@.service failed with permission denied

2023-12-11 Thread Mantas Mikulėnas
On Mon, Dec 11, 2023, 17:28 Christopher Wong 
wrote:

> Hi Mantas,
>
>
>
> I have added ExecStartPre to user@.service to run “id” and “ls -la”:
>
>
>
> Dec 11 15:50:34 host systemd-user-runtime-dir[40287]: Will mount
> /run/user/1001 owned by 1001:118
>
> Dec 11 15:50:34 host systemd-user-runtime-dir[40287]: Mounting tmpfs
> (tmpfs) on /run/user/1001 (MS_NOSUID|MS_NODEV
> "mode=0700,uid=1001,gid=118,size=99426304,nr_inodes=24274")...
>
> Dec 11 15:50:34 host systemd[1]: Finished User Runtime Directory
> /run/user/1001.
>
> Dec 11 15:50:34 host systemd[1]: Starting User Manager for UID 1001...
>
> Dec 11 15:50:34 host id[40291]: uid=1001(ida) gid=118(ssh-users)
> groups=118(ssh-users),236(systemd-journal)
>
> Dec 11 15:50:34 host ls[40293]: drwxr-xr-x3 root root
> 60 Dec 11 15:50 .
>
> Dec 11 15:50:34 host ls[40293]: drwxr-xr-x   98 root root
> 2120 Dec 11 15:30 ..
>
> Dec 11 15:50:34 host ls[40293]: drwx--2 root root
> 40 Dec 11 15:50 1001
>
> Dec 11 15:50:34 host systemd[40294]: systemd 254.7-2-g9edc143 running in
> user mode for user 1001/ida. (-PAM -AUDIT -SELINUX -APPARMOR +IMA -SMACK
> +SECCOMP +GCRYPT +GNUTLS +OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 -IDN2
> -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK -PCRE2 -PWQUALITY -P11KIT
> -QRENCODE -TPM2 +BZIP2 -LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON -UTMP
> -SYSVINIT default-hierarchy=unified)
>
>
>
> The /run/user/1001 belongs to root with mode 0700. Should this belong to
> root?
>
No, it should be owned by UID 1001 (though still mode 0700).

> Is it because I manually start user@1001.service as root?
>
Which user started the .service is usually not important, all services get
a "fresh" environment that's fully described by the unit file.

So even if you did 'systemctl start' as root, the unit has User=%i so the
instance parameter tells it which UID to run as, so will be running as UID
1001. Likewise user-runtime-dir@1001 will get the UID for the mount from
its instance name (you can see that the "Mounting tmpfs" message has the
correct information).

> However, after user-runtime-dir@1001.service has finished it startup,
> the user@1001.service is started as uid=1001 and therefore can’t create
> any directories under /run/user/1001. Resulting in user@1001.service
> failed to start.
>
>
>
> If I add “ExecStartPre=+chown %i /run/user/%i” to user@.service then it
> works! But I am unsure if this is really the way fix this.
>

So far, it sounds like the directory is being created *by something else*
before user-runtime-dir@ is even invoked.

Try adding the same "-/bin/ls -lad /run/user/%i" as both ExecStartPre and
ExecStartPost of user-runtime-dir@ (and maybe even a findmnt). If the
directory already exists during ExecStartPre, start looking for other
services or cronjobs, or tmpfiles.d configs, or 'su' invocations, which may
cause it to be created.

There might also be something that chowns it to root *after* it was created
correctly. If you actually see the tmpfs mount in 'findmnt' or in 'mount',
but it's owned by root despite having uid=1001 in its mount options,
something has chowned it...or your tmpfs feature is broken.

If you don't see it in findmnt at all, even after user-runtime-dir has
succeeded – either the mount failed quietly, or... something (like systemd
itself) has quietly unmounted it.


>
> Regarding the testing, I have done both restart of everything and manual,
> but the result is the same. Now that I have the
> “Environment=XDG_RUNTIME_DIR=/run/user/%i” I no longer need to do
> “systemctl set-environment …”
>
>
>
> Thank you for taking your time!
>
>
>
> Best regards,
>
> Christopher Wong
>
>
>
>
>
> *From: *Mantas Mikulėnas 
> *Date: *Friday, 8 December 2023 at 21:53
> *To: *Christopher Wong 
> *Cc: *Systemd 
> *Subject: *Re: [systemd-devel] Manual start of user@.service failed
> with permission denied
>
> On Fri, Dec 8, 2023 at 6:53 PM Christopher Wong 
> wrote:
>
> Hi Mantas,
>
>
>
> I have from your suggestion done the following:
>
>
>
> Putting the below in user@.service
>
>
>
> [Service]
>
> ...
>
> Environment=XDG_RUNTIME_DIR=/run/user/%i
>
> Environment=SYSTEMD_LOG_LEVEL=debug
>
>
>
> Putting the below in user-runtime-dir@.service
>
>
>
> [Service]
>
> ...
>
> Environment=SYSTEMD_LOG_LEVEL=debug
>
>
>
> Then I have disabled the global set-log-level debug (if this is also
> required, please let me know).
>
>
>
> Unlike set-environment that's not global, it only affects pid1.
>
>
>
>
>
> What I can see from the logs is that user-runtime-dir@1001.service seems
> to be started and mount /run/user/1001, but addition creation of directory
> under this mount is getting permission denied.
>
>
>
> Dec 08 17:33:29 host systemd[1]: Created slice User Slice of UID 1001.
>
> Dec 08 17:33:29 host systemd[1]: Starting User Runtime Directory
> /run/user/1001...
>
> Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing
> state UNSET -> OPENING
>
> Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> 
> > Here is the boot sequence with initoverlayfs integrated, the
> > mini-initramfs contains just enough to get storage drivers loaded and
> > storage devices initialized. storage-init is a process that is not
> > designed to replace init, it does just enough to initialize storage
> > (performs a targeted udev trigger on storage), switches to
> > initoverlayfs as root and then executes init.
> >
> > ```
> > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> >
> > fw -> bootloader -> kernel -> storage-init   -> init ->
> > ```
> 
> I am not sure I follow what these chains are supposed to mean? Why are
> there two lines?
> 
> So, I generally would agree that the current initrd scheme is not
> ideal, and we have been discussing better approaches. But I am not
> sure your approach really is useful on generic systems for two
> reasons:
> 
> 1. no security model? you need to authenticate your initrd in
>2023. There's no execuse to not doing that anymore these days. Not
>in automotive, and not anywhere else really.
> 
> 2. no way to deal with complex storage? i.e. people use FDE, want to
>unlock their root disks with TPM2 and similar things. People use
>RAID, LVM, and all that mess.
> 
> Actually the above are kinda the same problem in a way: you need
> complex storage, but if you need that you kinda need udev, and
> services, and then also systemd and all that other stuff, and that's
> why the system works like the system works right now.
> 
> Whenever you devise a system like yours by cutting corners, and
> declaring that you don't want TPM, you don't want signed initrds, you
> don't want to support weird storage, you just solve your problem in a
> very specific way, ignoring the big picture. Which is OK, *if* you can
> actually really work without all that and are willing to maintain the
> solution for your specific problem only.
> 
> As I understand you are trying to solve multiple problems at once
> here, and I think one should start with figuring out clearly what
> those are before trying to address them, maybe without compromising on
> security. So my guess is you want to address the following:
> 
> 1. You don't want the whole big initrd to be read off disk on every
>boot, but only the parts of it that are actually needed.
> 
> 2. You don't want the whole big initrd to be fully decompressed on every
>boot, but only the parts of it that are actually needed.
> 
> 3. You want to share data between root fs and initrd
> 
> 4. You want to save some boot time by not bringing up an init system
>in the initrd once, then tearing it down again, and starting it
>again from the root fs.
> 
> For the items listed above I think you can find different solutions
> which do not necessarily compromise security as much.
> 
> So, in the list above you could address the latter three like this:
> 
> 2. Use an erofs rather than a packed cpio as initrd. Make the boot
>loader load the erofs into contigous memory, then use memmap=X!Y on
>the kernel cmdline to synthesize a block device from that, which
>you then mount directly (without any initrd) via
>root=/dev/pmem0. This means yout boot loader will still load the
>whole image into memory, but only decompress the bits actually
>neeed. (It also has some other nice benefits I like, such as an
>immutable rootfs, which tmpfs-based initrds don't have.)
> 
> 3. Simply never transition to the root fs, don't marke the initrds in
>systemd's eyes as an initrd (specifically: don't add an
>/etc/initrd-release file to it). Instead, just merge resources of
>the root fs into your initrd fs via overlayfs. systemd has
>infrastructure for this: "systemd-sysext". It takes immutable,
>authenticated erofs images (with verity, we call them "DDIs",
>i.e. "discoverable disk images") that it overlays into /usr/. [You
>could also very nicely combine this approach with systemd's
>portable services, and npsawn containers, which operate on the same
>authenticated images]. At MSFT we have a major product that works
>exactly like this: the OS runs off a rootfs that is loaded as an
>initrd, and everything that runs on top of this are just these
>verity disk images, using overlayfs and portable services.
> 
> 4. The proposal in 3 also addresses goal 4.
> 
> Which leaves item 1, which is a bit harder to address. We have been
> discussing this off an on internally too. A generic solution to this
> is hard. My current thinking for this could be something like this,
> covering the UEFI world: support sticking a DDI for the main initrd in
> the ESP. The ESP is per definition unencrypted and unauthenticated,
> but otherwise relatively well defined, i.e. known to be vfat and
> discoverable via UUID on a GPT disk. So: build a minimal
> 

Re: [systemd-devel] Manual start of user@.service failed with permission denied

2023-12-11 Thread Christopher Wong
Hi Andrei,

As indicated in the logs no SELINUX nor APPARMOR is enabled.

Best regards,
Christopher Wong


From: systemd-devel  on behalf of 
Andrei Borzenkov 
Date: Saturday, 9 December 2023 at 07:13
To: systemd-devel@lists.freedesktop.org 
Subject: Re: [systemd-devel] Manual start of user@.service failed with 
permission denied
On 08.12.2023 23:53, Mantas Mikulėnas wrote:
...

>>
>> Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Will mount
>> /run/user/1001 owned by 1001:118
>>
>> Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Mounting tmpfs
>> (tmpfs) on /run/user/1001 (MS_NOSUID|MS_NODEV
>> "mode=0700,uid=1001,gid=118,size=99426304,nr_inodes=24274")...
>>
>> Dec 08 17:33:29 host systemd[1]: Finished User Runtime Directory
>> /run/user/1001.
>>
>> Dec 08 17:33:29 host systemd[1]: Starting User Manager for UID 1001...
>>
>> Dec 08 17:33:29 host systemd[36280]: systemd 254.7-2-g9edc143 running in
>> user mode for user 1001/ida. (-PAM -AUDIT -SELINUX -APPARMOR +IMA -SMACK
>> +SECCOMP +GCRYPT +GNUTLS +OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 -IDN2
>> -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK -PCRE2 -PWQUALITY -P11KIT
>> -QRENCODE -TPM2 +BZIP2 -LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON -UTMP
>> -SYSVINIT default-hierarchy=unified)
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible', ignoring: Permission denied
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible/reg', ignoring: Permission denied
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible/dir', ignoring: Permission denied
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible/fifo', ignoring: Permission denied
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible/sock', ignoring: Permission denied
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible/chr', ignoring: Permission denied
>>
>> Dec 08 17:33:29 host systemd[36280]: Failed to create
>> '/run/user/1001/systemd/inaccessible/blk', ignoring: Permission denied
>>
>
> What's the ownership of /run/user/1001 and /run/user/1001/systemd after all
> of this?
>
> Are you rebooting between tests or just manually starting it?
>
> My current guess is that due to the earlier `systemctl set-environment`,
> some *other* thing that's running as root inherited the /run/user/1001 path
> and created root-owned directories there? That's the issue with setting
> global environment, it needs to be unset afterwards...
>

"Permission denied" sounds like something LSM related (AppArmor,
SELinux, ...)


Re: [systemd-devel] Manual start of user@.service failed with permission denied

2023-12-11 Thread Christopher Wong
Hi Mantas,

I have added ExecStartPre to user@.service to run “id” 
and “ls -la”:

Dec 11 15:50:34 host systemd-user-runtime-dir[40287]: Will mount /run/user/1001 
owned by 1001:118
Dec 11 15:50:34 host systemd-user-runtime-dir[40287]: Mounting tmpfs (tmpfs) on 
/run/user/1001 (MS_NOSUID|MS_NODEV 
"mode=0700,uid=1001,gid=118,size=99426304,nr_inodes=24274")...
Dec 11 15:50:34 host systemd[1]: Finished User Runtime Directory /run/user/1001.
Dec 11 15:50:34 host systemd[1]: Starting User Manager for UID 1001...
Dec 11 15:50:34 host id[40291]: uid=1001(ida) gid=118(ssh-users) 
groups=118(ssh-users),236(systemd-journal)
Dec 11 15:50:34 host ls[40293]: drwxr-xr-x3 root root60 Dec 
11 15:50 .
Dec 11 15:50:34 host ls[40293]: drwxr-xr-x   98 root root  2120 Dec 
11 15:30 ..
Dec 11 15:50:34 host ls[40293]: drwx--2 root root40 Dec 
11 15:50 1001
Dec 11 15:50:34 host systemd[40294]: systemd 254.7-2-g9edc143 running in user 
mode for user 1001/ida. (-PAM -AUDIT -SELINUX -APPARMOR +IMA -SMACK +SECCOMP 
+GCRYPT +GNUTLS +OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 -IDN2 -IDN -IPTC 
+KMOD -LIBCRYPTSETUP +LIBFDISK -PCRE2 -PWQUALITY -P11KIT -QRENCODE -TPM2 +BZIP2 
-LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON -UTMP -SYSVINIT 
default-hierarchy=unified)

The /run/user/1001 belongs to root with mode 0700. Should this belong to root? 
Is it because I manually start user@1001.service as 
root?
However, after 
user-runtime-dir@1001.service has 
finished it startup,  the user@1001.service is 
started as uid=1001 and therefore can’t create any directories under 
/run/user/1001. Resulting in user@1001.service failed 
to start.

If I add “ExecStartPre=+chown %i /run/user/%i” to 
user@.service then it works! But I am unsure if this is 
really the way fix this.

Regarding the testing, I have done both restart of everything and manual, but 
the result is the same. Now that I have the 
“Environment=XDG_RUNTIME_DIR=/run/user/%i” I no longer need to do “systemctl 
set-environment …”

Thank you for taking your time!

Best regards,
Christopher Wong


From: Mantas Mikulėnas 
Date: Friday, 8 December 2023 at 21:53
To: Christopher Wong 
Cc: Systemd 
Subject: Re: [systemd-devel] Manual start of user@.service failed with 
permission denied
On Fri, Dec 8, 2023 at 6:53 PM Christopher Wong 
mailto:christopher.w...@axis.com>> wrote:
Hi Mantas,

I have from your suggestion done the following:

Putting the below in user@.service

[Service]
...
Environment=XDG_RUNTIME_DIR=/run/user/%i
Environment=SYSTEMD_LOG_LEVEL=debug

Putting the below in user-runtime-dir@.service

[Service]
...
Environment=SYSTEMD_LOG_LEVEL=debug

Then I have disabled the global set-log-level debug (if this is also required, 
please let me know).

Unlike set-environment that's not global, it only affects pid1.


What I can see from the logs is that 
user-runtime-dir@1001.service seems to be 
started and mount /run/user/1001, but addition creation of directory under this 
mount is getting permission denied.

Dec 08 17:33:29 host systemd[1]: Created slice User Slice of UID 1001.
Dec 08 17:33:29 host systemd[1]: Starting User Runtime Directory 
/run/user/1001...
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
UNSET -> OPENING
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: sd-bus: starting bus by 
connecting to /run/dbus/system_bus_socket...
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
OPENING -> AUTHENTICATING
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
AUTHENTICATING -> HELLO
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Sent message 
type=method_call sender=n/a destination=org.freedesktop.DBus 
path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 
reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Got message 
type=method_return sender=org.freedesktop.DBus destination=:1.2536 path=n/a 
interface=n/a member=n/a  cookie=1 reply_cookie=1 signature=s error-name=n/a 
error-message=n/a
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Bus n/a: changing state 
HELLO -> RUNNING
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Sent message 
type=method_call sender=n/a destination=org.freedesktop.login1 
path=/org/freedesktop/login1 interface=org.freedesktop.DBus.Properties 
member=Get cookie=2 reply_cookie=0 signature=ss error-name=n/a error-message=n/a
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: Got message 
type=method_return sender=:1.323 destination=:1.2536 path=n/a interface=n/a 
member=n/a  cookie=15 reply_cookie=2 signature=v error-name=n/a 
error-message=n/a
Dec 08 17:33:29 host systemd-user-runtime-dir[36278]: 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
On Mon, 11 Dec 2023 at 12:48, Eric Curtin  wrote:
>
> On Mon, 11 Dec 2023 at 11:51, Lennart Poettering  
> wrote:
> >
> > On Mo, 11.12.23 11:28, Eric Curtin (ecur...@redhat.com) wrote:
> >
> > > > > For the items listed above I think you can find different solutions
> > > > > which do not necessarily compromise security as much.
> > > > >
> > > > > So, in the list above you could address the latter three like this:
> > > > >
> > > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > > > >the kernel cmdline to synthesize a block device from that, which
> > > > >you then mount directly (without any initrd) via
> > > > >root=/dev/pmem0. This means yout boot loader will still load the
> > > > >whole image into memory, but only decompress the bits actually
> > > > >neeed. (It also has some other nice benefits I like, such as an
> > > > >immutable rootfs, which tmpfs-based initrds don't have.)
> > >
> > > What I am unsure about here, is the "make the bootloader load the
> > > erofs into contiguous memory" part. I wonder could we try and use the
> > > existing initramfs data as is.
> >
> > Today's initrds are packed cpio archives of an OS file system
> > hierarchy. What I proposed means you'd have to put the OS file system
> > hiearchy into an erofs image instead. Which is a trivial operation,
> > just unpack and repack.
> >
> > Note that there are two concepts of "initrd" out there.
> >
> > a) from the kernel perspective an initrd/initramfs (which both are
> >badly named, because its a tmpfs these days) is that packed cpio
> >archive that is unpacked into a tmpfs, and then jumped into.
> >
> > b) from systemd's perspective an initrd is an OS image that carries an
> >/etc/initrd-release file. If that file exists then systemd will not
> >boot up the system regularly, but instead just prepare everything
> >that it can transition into some other root fs.
> >
> > While most often in real life the initrds currently qualify under both
> > definitions. But there's no reason to always do this. You can also
> > have images the kernel would consider an initrd, but systemd does not,
> > which is something we use in the "USI" concept, i.e. "unified system
> > images", which are basically UKIs (large UKIs) with a complete rootfs
> > that is the main system of the OS. And you can also do it the other
> > way round, which is potentially what I am suggesting to you here: use
> > an erofs image that would not be considered an initrd by the kernel,
> > but that systemd would consider one, and transition out of.
> >
> > > I dunno if
> > > bootloaders make much assumptions about the format of that data, worst
> > > case scenario we could encapsulate erofs in the initramfs, cpio looking
> > > data.
> >
> > boot loaders generally don't bother with the cpio, it's just "data"
> > for them. Compression algorithms have changed in the past, and it only
> > mattered that the kernel could decompress it, the boot loader doesn't care.
> >
> > > Teach the kernel not to decompress and process the whole
> > > thing and mount it like an erofs alternatively. Does this sound crazy
> > > or reasonable?
> >
> > You are re-inventing the traditional "initrd" logic of the kernel
> > which was a ramdisk (i.e. a block device /dev/ram0), that was filled
> > with some fs of your choice loaded by the boot loader.
>
> Sort of yes, but preferably using that __initramfs_start /
> initrd_start buffer as is without copying any bytes anywhere else and
> without teaching the bootloaders to do things.
>
> The "memmap=" approach you suggested sounds like what we are thinking,
> but do you think we could do this without teaching bootloaders to do
> new things?

Like could we do that with a "initrd3.0=on" karg and it just uses the
__initramfs_start and __initramfs_size to memmap? (that probably
wouldn't be the arg name, it's just for description purposes here,
maybe it's even a build time flag, etc.)

>
> Although the nice thing about a storage-init like approach is there's
> basically zero copies up front. What storage-init is trying to be, is
> a tool to just call systemd storage things, without also inheriting
> all the systemd stack.
>
> >
> > Lennart
> >
> > --
> > Lennart Poettering, Berlin
> >



Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
On Mon, 11 Dec 2023 at 11:51, Lennart Poettering  wrote:
>
> On Mo, 11.12.23 11:28, Eric Curtin (ecur...@redhat.com) wrote:
>
> > > > For the items listed above I think you can find different solutions
> > > > which do not necessarily compromise security as much.
> > > >
> > > > So, in the list above you could address the latter three like this:
> > > >
> > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > > >the kernel cmdline to synthesize a block device from that, which
> > > >you then mount directly (without any initrd) via
> > > >root=/dev/pmem0. This means yout boot loader will still load the
> > > >whole image into memory, but only decompress the bits actually
> > > >neeed. (It also has some other nice benefits I like, such as an
> > > >immutable rootfs, which tmpfs-based initrds don't have.)
> >
> > What I am unsure about here, is the "make the bootloader load the
> > erofs into contiguous memory" part. I wonder could we try and use the
> > existing initramfs data as is.
>
> Today's initrds are packed cpio archives of an OS file system
> hierarchy. What I proposed means you'd have to put the OS file system
> hiearchy into an erofs image instead. Which is a trivial operation,
> just unpack and repack.
>
> Note that there are two concepts of "initrd" out there.
>
> a) from the kernel perspective an initrd/initramfs (which both are
>badly named, because its a tmpfs these days) is that packed cpio
>archive that is unpacked into a tmpfs, and then jumped into.
>
> b) from systemd's perspective an initrd is an OS image that carries an
>/etc/initrd-release file. If that file exists then systemd will not
>boot up the system regularly, but instead just prepare everything
>that it can transition into some other root fs.
>
> While most often in real life the initrds currently qualify under both
> definitions. But there's no reason to always do this. You can also
> have images the kernel would consider an initrd, but systemd does not,
> which is something we use in the "USI" concept, i.e. "unified system
> images", which are basically UKIs (large UKIs) with a complete rootfs
> that is the main system of the OS. And you can also do it the other
> way round, which is potentially what I am suggesting to you here: use
> an erofs image that would not be considered an initrd by the kernel,
> but that systemd would consider one, and transition out of.
>
> > I dunno if
> > bootloaders make much assumptions about the format of that data, worst
> > case scenario we could encapsulate erofs in the initramfs, cpio looking
> > data.
>
> boot loaders generally don't bother with the cpio, it's just "data"
> for them. Compression algorithms have changed in the past, and it only
> mattered that the kernel could decompress it, the boot loader doesn't care.
>
> > Teach the kernel not to decompress and process the whole
> > thing and mount it like an erofs alternatively. Does this sound crazy
> > or reasonable?
>
> You are re-inventing the traditional "initrd" logic of the kernel
> which was a ramdisk (i.e. a block device /dev/ram0), that was filled
> with some fs of your choice loaded by the boot loader.

Sort of yes, but preferably using that __initramfs_start /
initrd_start buffer as is without copying any bytes anywhere else and
without teaching the bootloaders to do things.

The "memmap=" approach you suggested sounds like what we are thinking,
but do you think we could do this without teaching bootloaders to do
new things?

Although the nice thing about a storage-init like approach is there's
basically zero copies up front. What storage-init is trying to be, is
a tool to just call systemd storage things, without also inheriting
all the systemd stack.

>
> Lennart
>
> --
> Lennart Poettering, Berlin
>



Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Lennart Poettering
On Mo, 11.12.23 11:42, Eric Curtin (ecur...@redhat.com) wrote:

> I am also thinking, what is the difference between "make the
> bootloader load the erofs into contiguous memory" part and doing
> something like storage-init.

Well, from my PoV there's value in reducing the stages of the boot
process, and reducing the amount of storage stacks you need in the
mix. Hence, the boot loader can load stuff from disk into memory
anyway, it always has done that, typically the kernel and the
initrd. just swapping out the format of the initrd to get better
behaviour is relatively cheap there, means no additional storage
logic, no additional stage of the boot. You basically only have "boot
loader" (which loads kernel and initrd), and the "host os" (which runs
of the final rootfs).

Otoh if you let your storage-init load the initrd, then you basically
have a third step in the middle, which shares a lot of props with the
last step, but also is distinct. I mean, you probably would reinvent
your own udev and DM stack for that, to get verity in the mix (because
that depends on DM, and udev, to some degree)

In my ideal model, initrds are just part of the UKI btw, so they end
up being loaded together with the rest of the kernel, and need no
verity becaused signed along with the UKI itself.

Lennart

--
Lennart Poettering, Berlin


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Lennart Poettering
On Mo, 11.12.23 11:28, Eric Curtin (ecur...@redhat.com) wrote:

> > > For the items listed above I think you can find different solutions
> > > which do not necessarily compromise security as much.
> > >
> > > So, in the list above you could address the latter three like this:
> > >
> > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > >the kernel cmdline to synthesize a block device from that, which
> > >you then mount directly (without any initrd) via
> > >root=/dev/pmem0. This means yout boot loader will still load the
> > >whole image into memory, but only decompress the bits actually
> > >neeed. (It also has some other nice benefits I like, such as an
> > >immutable rootfs, which tmpfs-based initrds don't have.)
>
> What I am unsure about here, is the "make the bootloader load the
> erofs into contiguous memory" part. I wonder could we try and use the
> existing initramfs data as is.

Today's initrds are packed cpio archives of an OS file system
hierarchy. What I proposed means you'd have to put the OS file system
hiearchy into an erofs image instead. Which is a trivial operation,
just unpack and repack.

Note that there are two concepts of "initrd" out there.

a) from the kernel perspective an initrd/initramfs (which both are
   badly named, because its a tmpfs these days) is that packed cpio
   archive that is unpacked into a tmpfs, and then jumped into.

b) from systemd's perspective an initrd is an OS image that carries an
   /etc/initrd-release file. If that file exists then systemd will not
   boot up the system regularly, but instead just prepare everything
   that it can transition into some other root fs.

While most often in real life the initrds currently qualify under both
definitions. But there's no reason to always do this. You can also
have images the kernel would consider an initrd, but systemd does not,
which is something we use in the "USI" concept, i.e. "unified system
images", which are basically UKIs (large UKIs) with a complete rootfs
that is the main system of the OS. And you can also do it the other
way round, which is potentially what I am suggesting to you here: use
an erofs image that would not be considered an initrd by the kernel,
but that systemd would consider one, and transition out of.

> I dunno if
> bootloaders make much assumptions about the format of that data, worst
> case scenario we could encapsulate erofs in the initramfs, cpio looking
> data.

boot loaders generally don't bother with the cpio, it's just "data"
for them. Compression algorithms have changed in the past, and it only
mattered that the kernel could decompress it, the boot loader doesn't care.

> Teach the kernel not to decompress and process the whole
> thing and mount it like an erofs alternatively. Does this sound crazy
> or reasonable?

You are re-inventing the traditional "initrd" logic of the kernel
which was a ramdisk (i.e. a block device /dev/ram0), that was filled
with some fs of your choice loaded by the boot loader.

Lennart

--
Lennart Poettering, Berlin


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
I am also thinking, what is the difference between "make the
bootloader load the erofs into contiguous memory" part and doing
something like storage-init.

They are similar approaches, introduce something in the middle to
handle the erofs.

Is mise le meas/Regards,

Eric Curtin

On Mon, 11 Dec 2023 at 11:28, Eric Curtin  wrote:
>
> On Mon, 11 Dec 2023 at 11:20, Eric Curtin  wrote:
> >
> > On Mon, 11 Dec 2023 at 10:06, Lennart Poettering  
> > wrote:
> > >
> > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > >
> > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > storage devices initialized. storage-init is a process that is not
> > > > designed to replace init, it does just enough to initialize storage
> > > > (performs a targeted udev trigger on storage), switches to
> > > > initoverlayfs as root and then executes init.
> > > >
> > > > ```
> > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > > >
> > > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > > ```
> > >
> > > I am not sure I follow what these chains are supposed to mean? Why are
> > > there two lines?
> >
> > The top line is the filesystem transition, the bottom is more like a
> > process perspective. Will make this clearer in future.
> >
> > >
> > > So, I generally would agree that the current initrd scheme is not
> > > ideal, and we have been discussing better approaches. But I am not
> > > sure your approach really is useful on generic systems for two
> > > reasons:
> > >
> > > 1. no security model? you need to authenticate your initrd in
> > >2023. There's no execuse to not doing that anymore these days. Not
> > >in automotive, and not anywhere else really.
> >
> > Yes you are right, there is no excuse, the plan was to mount using
> > dm-verity most likely with the details from the initramfs, but
> > admittedly we had not looked into that into great detail.
> >
> > >
> > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > >unlock their root disks with TPM2 and similar things. People use
> > >RAID, LVM, and all that mess.
> >
> > We had 3 thoughts on this:
> >
> > 1. Just worry about the common use-cases and leave everyone else
> > fallback to the approaches we use today.
> > 2. Try and split up systemd to make it even smaller. We do use
> > systemd-udev in the small initramfs storage-init process so far.
> > 3. Reimplement some things? But as little as possible, on a case by
> > case basis, we certainly don't want to fall into the trap of rewriting
> > systemd that's for sure, systemd does these things very well.
> >
> > Tbh, if we try and implement this in kernelspace a lot of these
> > questions go away. You just teach the kernel to deal with the
> > filesystem image early (say erofs or whatever other filesystem) and
> > have that data where initramfs data currently is. You still pay for
> > the initial read, but you still save a bunch of kernel time.
> >
> > >
> > > Actually the above are kinda the same problem in a way: you need
> > > complex storage, but if you need that you kinda need udev, and
> > > services, and then also systemd and all that other stuff, and that's
> > > why the system works like the system works right now.
> >
> > True, but there is also a bunch of stuff in current initrd's today
> > that aren't required to mount basic storage, but are designed around
> > the whole idea of having an early throwaway filesystem.
> >
> > >
> > > Whenever you devise a system like yours by cutting corners, and
> > > declaring that you don't want TPM, you don't want signed initrds, you
> > > don't want to support weird storage, you just solve your problem in a
> > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > actually really work without all that and are willing to maintain the
> > > solution for your specific problem only.
> > >
> > > As I understand you are trying to solve multiple problems at once
> > > here, and I think one should start with figuring out clearly what
> > > those are before trying to address them, maybe without compromising on
> > > security. So my guess is you want to address the following:
> > >
> > > 1. You don't want the whole big initrd to be read off disk on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 2. You don't want the whole big initrd to be fully decompressed on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 3. You want to share data between root fs and initrd
> > >
> > > 4. You want to save some boot time by not bringing up an init system
> > >in the initrd once, then tearing it down again, and starting it
> > >again from the root fs.
> >
> > It's mainly the top 3 that were the goals. And that people have the
> > freedom to consider using heavier weight generic libraries, 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
On Mon, 11 Dec 2023 at 11:20, Eric Curtin  wrote:
>
> On Mon, 11 Dec 2023 at 10:06, Lennart Poettering  wrote:
> >
> > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> >
> > > Here is the boot sequence with initoverlayfs integrated, the
> > > mini-initramfs contains just enough to get storage drivers loaded and
> > > storage devices initialized. storage-init is a process that is not
> > > designed to replace init, it does just enough to initialize storage
> > > (performs a targeted udev trigger on storage), switches to
> > > initoverlayfs as root and then executes init.
> > >
> > > ```
> > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > >
> > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > ```
> >
> > I am not sure I follow what these chains are supposed to mean? Why are
> > there two lines?
>
> The top line is the filesystem transition, the bottom is more like a
> process perspective. Will make this clearer in future.
>
> >
> > So, I generally would agree that the current initrd scheme is not
> > ideal, and we have been discussing better approaches. But I am not
> > sure your approach really is useful on generic systems for two
> > reasons:
> >
> > 1. no security model? you need to authenticate your initrd in
> >2023. There's no execuse to not doing that anymore these days. Not
> >in automotive, and not anywhere else really.
>
> Yes you are right, there is no excuse, the plan was to mount using
> dm-verity most likely with the details from the initramfs, but
> admittedly we had not looked into that into great detail.
>
> >
> > 2. no way to deal with complex storage? i.e. people use FDE, want to
> >unlock their root disks with TPM2 and similar things. People use
> >RAID, LVM, and all that mess.
>
> We had 3 thoughts on this:
>
> 1. Just worry about the common use-cases and leave everyone else
> fallback to the approaches we use today.
> 2. Try and split up systemd to make it even smaller. We do use
> systemd-udev in the small initramfs storage-init process so far.
> 3. Reimplement some things? But as little as possible, on a case by
> case basis, we certainly don't want to fall into the trap of rewriting
> systemd that's for sure, systemd does these things very well.
>
> Tbh, if we try and implement this in kernelspace a lot of these
> questions go away. You just teach the kernel to deal with the
> filesystem image early (say erofs or whatever other filesystem) and
> have that data where initramfs data currently is. You still pay for
> the initial read, but you still save a bunch of kernel time.
>
> >
> > Actually the above are kinda the same problem in a way: you need
> > complex storage, but if you need that you kinda need udev, and
> > services, and then also systemd and all that other stuff, and that's
> > why the system works like the system works right now.
>
> True, but there is also a bunch of stuff in current initrd's today
> that aren't required to mount basic storage, but are designed around
> the whole idea of having an early throwaway filesystem.
>
> >
> > Whenever you devise a system like yours by cutting corners, and
> > declaring that you don't want TPM, you don't want signed initrds, you
> > don't want to support weird storage, you just solve your problem in a
> > very specific way, ignoring the big picture. Which is OK, *if* you can
> > actually really work without all that and are willing to maintain the
> > solution for your specific problem only.
> >
> > As I understand you are trying to solve multiple problems at once
> > here, and I think one should start with figuring out clearly what
> > those are before trying to address them, maybe without compromising on
> > security. So my guess is you want to address the following:
> >
> > 1. You don't want the whole big initrd to be read off disk on every
> >boot, but only the parts of it that are actually needed.
> >
> > 2. You don't want the whole big initrd to be fully decompressed on every
> >boot, but only the parts of it that are actually needed.
> >
> > 3. You want to share data between root fs and initrd
> >
> > 4. You want to save some boot time by not bringing up an init system
> >in the initrd once, then tearing it down again, and starting it
> >again from the root fs.
>
> It's mainly the top 3 that were the goals. And that people have the
> freedom to consider using heavier weight generic libraries, tools,
> etc. if they want. You want to use Rust (or languages X, Y, Z) to
> write something early boot, go ahead! You'll only pay the cost for the
> larger binary if you actually use it. The week I started tinkering at
> this, there was a mini-debate on whether we should include glib or not
> in the initrd. And we are regularly under pressure to reduce boot time
> at the moment.
>
> Number 4 was a convenient way to do an early version of this, stick a
> process in between systemd and the kernel. But it turns out, it works
> 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Eric Curtin
On Mon, 11 Dec 2023 at 10:06, Lennart Poettering  wrote:
>
> On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
>
> > Here is the boot sequence with initoverlayfs integrated, the
> > mini-initramfs contains just enough to get storage drivers loaded and
> > storage devices initialized. storage-init is a process that is not
> > designed to replace init, it does just enough to initialize storage
> > (performs a targeted udev trigger on storage), switches to
> > initoverlayfs as root and then executes init.
> >
> > ```
> > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> >
> > fw -> bootloader -> kernel -> storage-init   -> init ->
> > ```
>
> I am not sure I follow what these chains are supposed to mean? Why are
> there two lines?

The top line is the filesystem transition, the bottom is more like a
process perspective. Will make this clearer in future.

>
> So, I generally would agree that the current initrd scheme is not
> ideal, and we have been discussing better approaches. But I am not
> sure your approach really is useful on generic systems for two
> reasons:
>
> 1. no security model? you need to authenticate your initrd in
>2023. There's no execuse to not doing that anymore these days. Not
>in automotive, and not anywhere else really.

Yes you are right, there is no excuse, the plan was to mount using
dm-verity most likely with the details from the initramfs, but
admittedly we had not looked into that into great detail.

>
> 2. no way to deal with complex storage? i.e. people use FDE, want to
>unlock their root disks with TPM2 and similar things. People use
>RAID, LVM, and all that mess.

We had 3 thoughts on this:

1. Just worry about the common use-cases and leave everyone else
fallback to the approaches we use today.
2. Try and split up systemd to make it even smaller. We do use
systemd-udev in the small initramfs storage-init process so far.
3. Reimplement some things? But as little as possible, on a case by
case basis, we certainly don't want to fall into the trap of rewriting
systemd that's for sure, systemd does these things very well.

Tbh, if we try and implement this in kernelspace a lot of these
questions go away. You just teach the kernel to deal with the
filesystem image early (say erofs or whatever other filesystem) and
have that data where initramfs data currently is. You still pay for
the initial read, but you still save a bunch of kernel time.

>
> Actually the above are kinda the same problem in a way: you need
> complex storage, but if you need that you kinda need udev, and
> services, and then also systemd and all that other stuff, and that's
> why the system works like the system works right now.

True, but there is also a bunch of stuff in current initrd's today
that aren't required to mount basic storage, but are designed around
the whole idea of having an early throwaway filesystem.

>
> Whenever you devise a system like yours by cutting corners, and
> declaring that you don't want TPM, you don't want signed initrds, you
> don't want to support weird storage, you just solve your problem in a
> very specific way, ignoring the big picture. Which is OK, *if* you can
> actually really work without all that and are willing to maintain the
> solution for your specific problem only.
>
> As I understand you are trying to solve multiple problems at once
> here, and I think one should start with figuring out clearly what
> those are before trying to address them, maybe without compromising on
> security. So my guess is you want to address the following:
>
> 1. You don't want the whole big initrd to be read off disk on every
>boot, but only the parts of it that are actually needed.
>
> 2. You don't want the whole big initrd to be fully decompressed on every
>boot, but only the parts of it that are actually needed.
>
> 3. You want to share data between root fs and initrd
>
> 4. You want to save some boot time by not bringing up an init system
>in the initrd once, then tearing it down again, and starting it
>again from the root fs.

It's mainly the top 3 that were the goals. And that people have the
freedom to consider using heavier weight generic libraries, tools,
etc. if they want. You want to use Rust (or languages X, Y, Z) to
write something early boot, go ahead! You'll only pay the cost for the
larger binary if you actually use it. The week I started tinkering at
this, there was a mini-debate on whether we should include glib or not
in the initrd. And we are regularly under pressure to reduce boot time
at the moment.

Number 4 was a convenient way to do an early version of this, stick a
process in between systemd and the kernel. But it turns out, it works
very well, the only problem is the reimplementation problem really.

Theoretically this could be systemd-storage-init -> systemd also. Or
systemd and dlopen more libraries as they become available later down
the line.

>
> For the items listed above I 

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Lennart Poettering
On Mo, 11.12.23 10:57, Lennart Poettering (mzerq...@0pointer.de) wrote:

> Which leaves item 1, which is a bit harder to address. We have been
> discussing this off an on internally too. A generic solution to this
> is hard. My current thinking for this could be something like this,
> covering the UEFI world: support sticking a DDI for the main initrd in
> the ESP. The ESP is per definition unencrypted and unauthenticated,
> but otherwise relatively well defined, i.e. known to be vfat and
> discoverable via UUID on a GPT disk. So: build a minimal
> single-process initrd into the kernel (i.e. UKI) that has exactly the
> storage to find a DDI on the ESP, and set it up. i.e. vfat+erofs fs
> drivers, and dm-verity. Then have a PID 1 that does exactly enough to
> jump into the rootfs stored in the ESP. That latter then has proper
> file system drivers, storage drivers, crypto stack, and can unlock the
> real root. This would still be a pretty specific solution to one set
> of devices though, as it could not cover network boots (i.e. where
> there is just no ESP to boot from), but I think this could be kept
> relatively close, as the logic in that case could just fall back into
> loading the DDI that normally would still in the ESP fully into
> memory.

BTW, one thing I would like to emphasize though. i think this item is
really the last thing you should focus on. If your OS never
transitions out of the initrd, and gets its payload merged in via
DDIs, then the root fs should be reasonably small enough and "fully
used at boot" (i.e. every sector read anyway) that doing this extra
work of finding a split-out DDI on the ESP is entirely unnecessary and
just a waste of time (both of developer time and boot time).

Lennart

--
Lennart Poettering, Berlin


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Lennart Poettering
On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:

> Here is the boot sequence with initoverlayfs integrated, the
> mini-initramfs contains just enough to get storage drivers loaded and
> storage devices initialized. storage-init is a process that is not
> designed to replace init, it does just enough to initialize storage
> (performs a targeted udev trigger on storage), switches to
> initoverlayfs as root and then executes init.
>
> ```
> fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
>
> fw -> bootloader -> kernel -> storage-init   -> init ->
> ```

I am not sure I follow what these chains are supposed to mean? Why are
there two lines?

So, I generally would agree that the current initrd scheme is not
ideal, and we have been discussing better approaches. But I am not
sure your approach really is useful on generic systems for two
reasons:

1. no security model? you need to authenticate your initrd in
   2023. There's no execuse to not doing that anymore these days. Not
   in automotive, and not anywhere else really.

2. no way to deal with complex storage? i.e. people use FDE, want to
   unlock their root disks with TPM2 and similar things. People use
   RAID, LVM, and all that mess.

Actually the above are kinda the same problem in a way: you need
complex storage, but if you need that you kinda need udev, and
services, and then also systemd and all that other stuff, and that's
why the system works like the system works right now.

Whenever you devise a system like yours by cutting corners, and
declaring that you don't want TPM, you don't want signed initrds, you
don't want to support weird storage, you just solve your problem in a
very specific way, ignoring the big picture. Which is OK, *if* you can
actually really work without all that and are willing to maintain the
solution for your specific problem only.

As I understand you are trying to solve multiple problems at once
here, and I think one should start with figuring out clearly what
those are before trying to address them, maybe without compromising on
security. So my guess is you want to address the following:

1. You don't want the whole big initrd to be read off disk on every
   boot, but only the parts of it that are actually needed.

2. You don't want the whole big initrd to be fully decompressed on every
   boot, but only the parts of it that are actually needed.

3. You want to share data between root fs and initrd

4. You want to save some boot time by not bringing up an init system
   in the initrd once, then tearing it down again, and starting it
   again from the root fs.

For the items listed above I think you can find different solutions
which do not necessarily compromise security as much.

So, in the list above you could address the latter three like this:

2. Use an erofs rather than a packed cpio as initrd. Make the boot
   loader load the erofs into contigous memory, then use memmap=X!Y on
   the kernel cmdline to synthesize a block device from that, which
   you then mount directly (without any initrd) via
   root=/dev/pmem0. This means yout boot loader will still load the
   whole image into memory, but only decompress the bits actually
   neeed. (It also has some other nice benefits I like, such as an
   immutable rootfs, which tmpfs-based initrds don't have.)

3. Simply never transition to the root fs, don't marke the initrds in
   systemd's eyes as an initrd (specifically: don't add an
   /etc/initrd-release file to it). Instead, just merge resources of
   the root fs into your initrd fs via overlayfs. systemd has
   infrastructure for this: "systemd-sysext". It takes immutable,
   authenticated erofs images (with verity, we call them "DDIs",
   i.e. "discoverable disk images") that it overlays into /usr/. [You
   could also very nicely combine this approach with systemd's
   portable services, and npsawn containers, which operate on the same
   authenticated images]. At MSFT we have a major product that works
   exactly like this: the OS runs off a rootfs that is loaded as an
   initrd, and everything that runs on top of this are just these
   verity disk images, using overlayfs and portable services.

4. The proposal in 3 also addresses goal 4.

Which leaves item 1, which is a bit harder to address. We have been
discussing this off an on internally too. A generic solution to this
is hard. My current thinking for this could be something like this,
covering the UEFI world: support sticking a DDI for the main initrd in
the ESP. The ESP is per definition unencrypted and unauthenticated,
but otherwise relatively well defined, i.e. known to be vfat and
discoverable via UUID on a GPT disk. So: build a minimal
single-process initrd into the kernel (i.e. UKI) that has exactly the
storage to find a DDI on the ESP, and set it up. i.e. vfat+erofs fs
drivers, and dm-verity. Then have a PID 1 that does exactly enough to
jump into the rootfs stored in the ESP. That latter then has