On 9/17/25 13:00, Andy Lutomirski wrote:
On Mon, Sep 15, 2025 at 10:09 AM Rob Landley <r...@landley.net> wrote:
While you're at it, could you fix static/builtin initramfs so PID 1 has
a valid stdin/stdout/stderr?
A static initramfs won't create /dev/console if the embedded initramfs
image doesn't contain it, which a non-root build can't mknod, so the
kernel plumbing won't see it dev in the directory we point it at unless
we build with root access.
I have no current insight as to whether there's a kernel issue here,
They fixed the behavior in one codepath. They left it broken in the
other codepath. The kernel's behavior is inconsistent.
Look:
$ mkdir sub; cc --static -xc - <<<'int main() {puts("hello\n");if
(fork()) reboot(0x01234567); for(;;);}' -o sub/init
$ (cd sub; cpio -o -H newc <<<init | gzip) > sub.cpio.gz
$ make allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n <<<'PANIC_TIMEOUT=1
RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT SERIAL_8250
SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER' | sed
's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot
-append console=ttyS0 -initrd sub.cpio.gz
You get a "hello" output near the end there. (You can add "quiet" to the
-append but given that qemu can't NOT output its bios spam there's not
much point.)
Now add INITRAMFS_SOURCE="sub" to the config and remove -initrd
sub.cpio.gz from the qemu invocation:
$ make clean allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n
<<<'PANIC_TIMEOUT=1 RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT
SERIAL_8250 SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER
INITRAMFS_SOURCE="sub"' | sed 's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot
-append 'console=ttyS0'
No "hello" output, but it DOES shut down cleanly instead of giving you a
panic trace so you know it ran the init binary.
All that changed was statically linking the initramfs instead of feeding
it in through the initrd mechanism: the kernel behaves differently in
those two codepaths, as I explained in the message you replied to.
(The above instructions assume an x86-64 host toolchain, poke me if you
want arm64 instead...)
but why are you trying to put actual device nodes in an actual
filesystem as part of a build process?
I'm not. Doing that would require root access on the build machine to
mknod in "sub" directory above. I build new images WITHOUT root access
on the host.
There used to be a way to feed a the kernel config a text file listing
what to make in the cpio file instead of just pointing it at a
directory, and my old Aboriginal Linux build used that mechanism
(generating such a file by hand, borrowing the kernel infrastructure but
driving it manually) 15 years ago:
https://landley.net/aboriginal/about.html
https://github.com/landley/aboriginal/blob/master/sources/functions.sh#L403
But kernel commit 469e87e89fd6 broke that mechanism because somebody
dunning-krugered it away ("I don't understand why we need this therefore
nobody needs it"). I had a patch to unbreak it for a while:
https://landley.net/bin/mkroot/0.8.10/linux-patches/0011-gen_init_cpio-regression.patch
But as with so many patches, lkml wasn't interested. (I mostly post them
so when copyright trolls try to rattle sabers I can point to an lkml web
archive entry that got ignored, and explain precisely HOW much bad PR
they're in for when they proceed.)
And again: you ONLY need this for static initramfs. Dynamic initramfs
has code create /dev/console (at boot time, not build time):
https://github.com/torvalds/linux/blob/v6.16/init/noinitramfs.c#L27
That code ONLY gets called for the external initrd loader, it does NOT
get called when a static initramfs image built into the kernel has a
runnable /init. This is an inconsistency in the kernel behavior, which
is what I'm objecting to.
It's extremely straightforward
to emit devices nodes in cpio format, and IMO it's far *more*
straightforward to do that than to make a whole directory, try to get
all the modes right, and cpio it up.
You mean like commit 595a22acee26 from 2017?
I wrote an absolutely trivial tool for this several years ago:
https://github.com/amluto/virtme/blob/master/virtme/cpiowriter.py
Let's see, I wrote the initramfs documentation in 2005:
https://lwn.net/Articles/157676/
Was already correcting kernel developers on how it actually worked
(rather than theoretically worked) in 2006:
https://lkml.iu.edu/hypermail//linux/kernel/0603.2/2760.html
I added tmpfs support to it in 2013 (because nobody else had bothered
for EIGHT YEARS):
https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html
I've maintained my own cpio implementation in toybox for over a decade:
https://github.com/landley/toybox/commit/a2d558151a63
The successor to aboriginal (above) is a 400 line bash script that
builds a dozen archtectures that each boot to a shell prompt in qemu:
https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh
https://landley.net/bin/mkroot/latest/
With automated regression test infrastructure to boot them all under
qemu and confirm that it runs, the clocks are set right, the network
works, and it can read from -hda:
https://github.com/landley/toybox/blob/master/mkroot/testroot.sh
So yes I _can_ create my own bespoke C program to modify the file in
arbitrary ways, I have my reasons not to do that, and have thought about
them for a while now.
it would be barely more complicated to strip the trailer off an cpio
file from some other source, add some device nodes, and stick the
trailer back on.
So you're unaware that the kernel accepts concatenated archives, and you
can just cat together two cpio.gz files and they'll extract. (In gzip
anyway, I haven't tested the other compression formats. That's why I
needed to do https://github.com/landley/toybox/commit/dafb9211c777 and
95a15d238120 by the way.)
The problem is there's no portable existing userspace tool to create a
cpio archive from non-filesystem data. Partly because there WAS a
mechanism built into the kernel... until that guy broke it in 2020. When
I'm making a squashfs I've got the -p option (presumably modeled on what
the kernel used to do before it broke), but the host cpio hasn't got a
way to specify that and adding my own bespoke format to toybox... I'm
still trying to get
https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html into
coreutils. (Alas lkml isn't the only 30 year old community that's gotten
stiff and hard of hearing.)
I could emit cpio contents with xxd -r from a HERE document hexdump or
something to append to the generated file, but xxd isn't installed by
default on debian and echo \x is WAY ugly, and "here's a giant hex dump
you're not expected to understand" isn't really something I want to add
to an otherwise understandable build. Writing, building, and running my
own bespoke tool in C to do it isn't really an improvement over the hexdump.
The kernel ALMOST already does this. The code just needs to be
refactored a bit, preferably so there aren't two codepaths each with
half the testing.
But it's also really, really, really easy to emit an
entire, functioning cpio-formatted initramfs from plain user code with
no filesystem manipulation at all. This also makes that portion of
the build reproducible, which is worth quite a bit IMO.
Sigh. When I started working on reproducible builds they weren't called
that yet, but I don't think digging for more links would help here. I
did do a rollup of what I'm trying to accomplish 5 years ago though
http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html
and long long ago, there was https://landley.net/aboriginal/history.html
and...
Query: is your "plain user code" built with "cc"? Do you reliably have a
"cc" link, or do you need to explicitly say "gcc" or "clang"? The kernel
needs to do the latter for some reason, and my patch to GET to the
kernel to at least _try_ "cc" before falling back to the others was
explicitly rejected...
--Andy
Rob