On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <[email protected]> wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
> 
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
> 
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
> 
> The normal way to do that is with a simple switch() statement, which
> makes the compiler able to do a much better job. Not just for the
> state selector - maybe it picks that function pointer approach, but
> probably these days just direct comparisons - but simply to do things
> like inline all those "it's used in one place" cases entirely. In
> fact, at that point, a lot of the state machine changes may end up
> turning into pure goto's - compilers are sometimes good at that
> (because state machines have often been very timing-critical).

FWIW, I tried doing

--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -401,24 +401,26 @@ static int __init do_symlink(void)
        return 0;
 }

-static __initdata int (*actions[])(void) = {
-       [Start]         = do_start,
-       [Collect]       = do_collect,
-       [GotHeader]     = do_header,
-       [SkipIt]        = do_skip,
-       [GotName]       = do_name,
-       [CopyFile]      = do_copy,
-       [GotSymlink]    = do_symlink,
-       [Reset]         = do_reset,
-};
-
 static long __init write_buffer(char *buf, unsigned long len)
 {
+       int ret;
+
        byte_count = len;
        victim = buf;

-       while (!actions[state]())
-               ;
+       do {
+               switch (state) {
+               case Start: ret = do_start(); break;
+               case Collect: ret = do_collect(); break;
+               case GotHeader: ret = do_header(); break;
+               case SkipIt: ret = do_skip(); break;
+               case GotName: ret = do_name(); break;
+               case CopyFile: ret = do_copy(); break;
+               case GotSymlink: ret = do_symlink(); break;
+               case Reset: ret = do_reset(); break;
+               }
+       } while (!ret);
+
        return len - byte_count;
 }


and yes, all the do_* functions get inlined into write_buffer with some
net space saving:

add/remove: 0/9 grow/shrink: 1/0 up/down: 1696/-2112 (-416)
Function                                     old     new   delta
write_buffer                                 100    1796   +1696
actions                                       32       -     -32
do_start                                      52       -     -52
do_reset                                     112       -    -112
do_skip                                      128       -    -128
do_collect                                   144       -    -144
do_symlink                                   172       -    -172
do_copy                                      304       -    -304
do_header                                    572       -    -572
do_name                                      596       -    -596
Total: Before=5360, After=4944, chg -7.76%

(ppc32 build). But the unpacking still takes the same time. It might be
different on x86.

Rasmus

Reply via email to