dtb_prop_read_cells(prop, 2, off) needs to read an 8-byte big-endian
value from a property payload whose alignment is only 4 bytes (DTB
property data is 4-aligned by spec).  The imported read_cells() did
this with:

        uint64_t tmp;
        __builtin_memcpy(&tmp, addr, 8);
        return __builtin_bswap64(tmp);

GCC 14 with -O2 fuses the 8-byte memcpy into a single `LDR Xt`,
which is correct for a 4-aligned source under normal cacheable
memory attributes.  But early_dtb_walk() runs before
pmap_bootstrap() turns on the MMU — at that point the CPU treats
all memory as Device-nGnRnE, where an 8-byte load against a 4-byte-
aligned address takes an Unaligned Access fault.  The kernel hangs
in EL1 before the first DTB cell is even printed.

Split the 2-cell case into two explicit 4-byte loads + a shift:

        const dtb_uint32_t *p = addr;
        return ((uint64_t) be32toh(p[0]) << 32) | be32toh(p[1]);

The compiler can't (and shouldn't) fuse these back into LDR Xt
since they're independent 4-byte loads at known offsets.

This bug fires on every modern arm64 DTB.  Any reg property whose
parent has `#address-cells = 2` or `#size-cells = 2` (i.e., virtually
all of them) triggers the 2-cell path.
---
 device/dtb.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/device/dtb.c b/device/dtb.c
index 529dae5e..05485ded 100644
--- a/device/dtb.c
+++ b/device/dtb.c
@@ -346,7 +346,7 @@ static uint64_t read_cells(
        unsigned short  size,
        vm_size_t       *off)
 {
-       uint64_t        tmp;
+       const dtb_uint32_t      *p;
 
        addr = (const unsigned char *) addr + *off;
        *off += size * 4;
@@ -357,8 +357,19 @@ static uint64_t read_cells(
                case 1:
                        return be32toh(*(const dtb_uint32_t *) addr);
                case 2:
-                       __builtin_memcpy(&tmp, addr, 8);
-                       return __builtin_bswap64(tmp);
+                       /*
+                        * DTB property data is only 4-byte aligned, but a
+                        * naive 8-byte memcpy gets fused by the compiler
+                        * into a single LDR Xt. With the MMU still off
+                        * during early boot all memory is treated as
+                        * Device-nGnRnE, where an 8-byte load against a
+                        * 4-byte-aligned address takes an alignment fault.
+                        * Read the two cells separately so the compiler is
+                        * forced to emit 4-byte loads.
+                        */
+                       p = addr;
+                       return ((uint64_t) be32toh(p[0]) << 32)
+                               | be32toh(p[1]);
                default:
                        panic("Unimplemented cell size: %d\n", size);
        }
-- 
2.54.0


Reply via email to