Hello Leonid,

This is very surprising.

As far as I can tell, BOOT_BSS is section .boot.bss and that just goes
to the normal BSS region, as src/arch/arm/common_arm.lds only has:

    .boot . : AT(ADDR(.boot) - KERNEL_OFFSET)
    {
        *(.boot.text)
        *(.boot.rodata)
        *(.boot.data)
        . = ALIGN(64K);
    }

    ki_boot_end = .;

Even if it would go in a different section, the kernel itself doesn't
treat it differently other than re-using the memory when done booting.

I think that it works by chance if you remove the BOOT_BSS, except if
I missed something somewhere.

My guess is that Elfloader doesn't properly synchronise data across cores before jumping to the kernel. Perhaps it makes an assumption which is not
true for Neoverse.

Does adding extra synchronisation before starting the other cores solve
your problem? Add it to smp_boot(), before the init_cpus() call and move
the "non_boot_lock = 1;" to before the DSB or whatever sync you add:

https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-arm/smp_boot.c

Greetings,

Indan

On 2024-05-09 15:29, Leonid Meyerovich wrote:
-BOOT_BSS static volatile int node_boot_lock;
+static volatile int node_boot_lock;

In seL4 kernel code try_init_kernel_secondary_core every core waits on
node_boot_lock to start initialization. As you can see above
node_boot_lock has been defined as BOOT_BSS is a specific usage
context of the .bss section during the boot process. Apparently, the
Morello processor introduces some unique features and behaviors that
might affect how memory regions like the BOOT_BSS section handles
cache coherence in a multi-threaded environment. My research shows
that sometimes threads don't see node_boot_lock changes and don't
start initialization. I have removed BOOT_BSS and it is booted
properly now

Thanks,
Leonid

On Fri, May 3, 2024 at 8:13 AM Indan Zupancic <in...@nul.nu> wrote:

Hello Leonid,

On 2024-04-25 21:37, Leonid Meyerovich wrote:
I run my seL4 application on 4 core ARMv8.

Still the new platform port I assume?

Recently I have developed some more application code (I mention it

because
file system size is increased an I believe it may change boot
time) and
from time to time I observed that boot process failed (stuck after
the
following print)

Does this only happen with a bigger application? And is it always
stuck
for a specific binary, or does the same binary sometimes work?

The application isn't started until after the kernel is done
booting,
so its size should not affect cores coming up or not. If it does, it
may be a bug in Elfloader. If that is the case, it should either
always succeed or always fail.

Some time I see that only 2 CPU have been booted (like CPU1 and
CPU3,
or CPU2 and CPU3)
Last print is the last one in init_kernel() function

Core 0 will only finish if all other cores have done their init too.

I recommend adding some debugging to
try_init_kernel_secondary_core()
and init_cpu() to see what where it gets stuck.

Also be aware that debug printing may disappear if multiple cores
print at exactly the same time, depending on the UART driver.

Greetings,

Indan
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

Reply via email to