On Sun, May 24, 2026 at 4:43 AM Paulo Duarte <[email protected]> wrote:
>
> dtb_prop_read_cells(prop, 2, off) needs to read an 8-byte big-endian
> value from a property payload whose alignment is only 4 bytes (DTB
> property data is 4-aligned by spec).  The imported read_cells() did
> this with:
>
>         uint64_t tmp;
>         __builtin_memcpy(&tmp, addr, 8);
>         return __builtin_bswap64(tmp);
>
> GCC 14 with -O2 fuses the 8-byte memcpy into a single `LDR Xt`,
> which is correct for a 4-aligned source under normal cacheable
> memory attributes.  But early_dtb_walk() runs before
> pmap_bootstrap() turns on the MMU — at that point the CPU treats
> all memory as Device-nGnRnE, where an 8-byte load against a 4-byte-
> aligned address takes an Unaligned Access fault.  The kernel hangs
> in EL1 before the first DTB cell is even printed.

Okay...

>
> Split the 2-cell case into two explicit 4-byte loads + a shift:
>
>         const dtb_uint32_t *p = addr;
>         return ((uint64_t) be32toh(p[0]) << 32) | be32toh(p[1]);
>
> The compiler can't (and shouldn't) fuse these back into LDR Xt
> since they're independent 4-byte loads at known offsets.

...why cannot it? What would prevent it from doing so? These are still
just memory loads, and the problematic instruction is a perfectly fine
way to do memory loads as far as the compiler is concerned, right?
(Was this suggested by an LLM?)

What you really want to prevent issues like that is either building
this function (or perhaps this whole file) with a compiler option that
explicitly tells it to avoid those sorts of assumptions /
instructions, or as a simpler hack, using volatile.

Reply via email to