On 9/17/25 13:00, Andy Lutomirski wrote:
On Mon, Sep 15, 2025 at 10:09 AM Rob Landley <r...@landley.net> wrote:

While you're at it, could you fix static/builtin initramfs so PID 1 has
a valid stdin/stdout/stderr?

A static initramfs won't create /dev/console if the embedded initramfs
image doesn't contain it, which a non-root build can't mknod, so the
kernel plumbing won't see it dev in the directory we point it at unless
we build with root access.

I have no current insight as to whether there's a kernel issue here,

They fixed the behavior in one codepath. They left it broken in the other codepath. The kernel's behavior is inconsistent.

Look:

$ mkdir sub; cc --static -xc - <<<'int main() {puts("hello\n");if (fork()) reboot(0x01234567); for(;;);}' -o sub/init
$ (cd sub; cpio -o -H newc <<<init | gzip) > sub.cpio.gz
$ make allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n <<<'PANIC_TIMEOUT=1 RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT SERIAL_8250 SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER' | sed 's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot -append console=ttyS0 -initrd sub.cpio.gz

You get a "hello" output near the end there. (You can add "quiet" to the -append but given that qemu can't NOT output its bios spam there's not much point.)

Now add INITRAMFS_SOURCE="sub" to the config and remove -initrd sub.cpio.gz from the qemu invocation:

$ make clean allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n <<<'PANIC_TIMEOUT=1 RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT SERIAL_8250 SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER INITRAMFS_SOURCE="sub"' | sed 's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot -append 'console=ttyS0'

No "hello" output, but it DOES shut down cleanly instead of giving you a panic trace so you know it ran the init binary.

All that changed was statically linking the initramfs instead of feeding it in through the initrd mechanism: the kernel behaves differently in those two codepaths, as I explained in the message you replied to.

(The above instructions assume an x86-64 host toolchain, poke me if you want arm64 instead...)

but why are you trying to put actual device nodes in an actual
filesystem as part of a build process?

I'm not. Doing that would require root access on the build machine to mknod in "sub" directory above. I build new images WITHOUT root access on the host.

There used to be a way to feed a the kernel config a text file listing what to make in the cpio file instead of just pointing it at a directory, and my old Aboriginal Linux build used that mechanism (generating such a file by hand, borrowing the kernel infrastructure but driving it manually) 15 years ago:

https://landley.net/aboriginal/about.html

https://github.com/landley/aboriginal/blob/master/sources/functions.sh#L403

But kernel commit 469e87e89fd6 broke that mechanism because somebody dunning-krugered it away ("I don't understand why we need this therefore nobody needs it"). I had a patch to unbreak it for a while:

https://landley.net/bin/mkroot/0.8.10/linux-patches/0011-gen_init_cpio-regression.patch

But as with so many patches, lkml wasn't interested. (I mostly post them so when copyright trolls try to rattle sabers I can point to an lkml web archive entry that got ignored, and explain precisely HOW much bad PR they're in for when they proceed.)

And again: you ONLY need this for static initramfs. Dynamic initramfs has code create /dev/console (at boot time, not build time):

https://github.com/torvalds/linux/blob/v6.16/init/noinitramfs.c#L27

That code ONLY gets called for the external initrd loader, it does NOT get called when a static initramfs image built into the kernel has a runnable /init. This is an inconsistency in the kernel behavior, which is what I'm objecting to.

It's extremely straightforward
to emit devices nodes in cpio format, and IMO it's far *more*
straightforward to do that than to make a whole directory, try to get
all the modes right, and cpio it up.

You mean like commit 595a22acee26 from 2017?

I wrote an absolutely trivial tool for this several years ago:

https://github.com/amluto/virtme/blob/master/virtme/cpiowriter.py

Let's see, I wrote the initramfs documentation in 2005:

https://lwn.net/Articles/157676/

Was already correcting kernel developers on how it actually worked (rather than theoretically worked) in 2006:

https://lkml.iu.edu/hypermail//linux/kernel/0603.2/2760.html

I added tmpfs support to it in 2013 (because nobody else had bothered for EIGHT YEARS):

https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

I've maintained my own cpio implementation in toybox for over a decade:

https://github.com/landley/toybox/commit/a2d558151a63

The successor to aboriginal (above) is a 400 line bash script that builds a dozen archtectures that each boot to a shell prompt in qemu:

https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh
https://landley.net/bin/mkroot/latest/

With automated regression test infrastructure to boot them all under qemu and confirm that it runs, the clocks are set right, the network works, and it can read from -hda:

https://github.com/landley/toybox/blob/master/mkroot/testroot.sh

So yes I _can_ create my own bespoke C program to modify the file in arbitrary ways, I have my reasons not to do that, and have thought about them for a while now.

it would be barely more complicated to strip the trailer off an cpio
file from some other source, add some device nodes, and stick the
trailer back on.

So you're unaware that the kernel accepts concatenated archives, and you can just cat together two cpio.gz files and they'll extract. (In gzip anyway, I haven't tested the other compression formats. That's why I needed to do https://github.com/landley/toybox/commit/dafb9211c777 and 95a15d238120 by the way.)

The problem is there's no portable existing userspace tool to create a cpio archive from non-filesystem data. Partly because there WAS a mechanism built into the kernel... until that guy broke it in 2020. When I'm making a squashfs I've got the -p option (presumably modeled on what the kernel used to do before it broke), but the host cpio hasn't got a way to specify that and adding my own bespoke format to toybox... I'm still trying to get https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html into coreutils. (Alas lkml isn't the only 30 year old community that's gotten stiff and hard of hearing.)

I could emit cpio contents with xxd -r from a HERE document hexdump or something to append to the generated file, but xxd isn't installed by default on debian and echo \x is WAY ugly, and "here's a giant hex dump you're not expected to understand" isn't really something I want to add to an otherwise understandable build. Writing, building, and running my own bespoke tool in C to do it isn't really an improvement over the hexdump.

The kernel ALMOST already does this. The code just needs to be refactored a bit, preferably so there aren't two codepaths each with half the testing.

But it's also really, really, really easy to emit an
entire, functioning cpio-formatted initramfs from plain user code with
no filesystem manipulation at all.  This also makes that portion of
the build reproducible, which is worth quite a bit IMO.

Sigh. When I started working on reproducible builds they weren't called that yet, but I don't think digging for more links would help here. I did do a rollup of what I'm trying to accomplish 5 years ago though http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html and long long ago, there was https://landley.net/aboriginal/history.html and...

Query: is your "plain user code" built with "cc"? Do you reliably have a "cc" link, or do you need to explicitly say "gcc" or "clang"? The kernel needs to do the latter for some reason, and my patch to GET to the kernel to at least _try_ "cc" before falling back to the others was explicitly rejected...

--Andy

Rob

Reply via email to