dtb_prop_read_cells(prop, 2, off) needs to read an 8-byte big-endian
value from a property payload whose alignment is only 4 bytes (DTB
property data is 4-aligned by spec). The imported read_cells() did
this with:
uint64_t tmp;
__builtin_memcpy(&tmp, addr, 8);
return __builtin_bswap64(tmp);
GCC 14 with -O2 fuses the 8-byte memcpy into a single `LDR Xt`,
which is correct for a 4-aligned source under normal cacheable
memory attributes. But early_dtb_walk() runs before
pmap_bootstrap() turns on the MMU — at that point the CPU treats
all memory as Device-nGnRnE, where an 8-byte load against a 4-byte-
aligned address takes an Unaligned Access fault. The kernel hangs
in EL1 before the first DTB cell is even printed.
Split the 2-cell case into two explicit 4-byte loads + a shift:
const dtb_uint32_t *p = addr;
return ((uint64_t) be32toh(p[0]) << 32) | be32toh(p[1]);
The compiler can't (and shouldn't) fuse these back into LDR Xt
since they're independent 4-byte loads at known offsets.
This bug fires on every modern arm64 DTB. Any reg property whose
parent has `#address-cells = 2` or `#size-cells = 2` (i.e., virtually
all of them) triggers the 2-cell path.
---
device/dtb.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/device/dtb.c b/device/dtb.c
index 529dae5e..05485ded 100644
--- a/device/dtb.c
+++ b/device/dtb.c
@@ -346,7 +346,7 @@ static uint64_t read_cells(
unsigned short size,
vm_size_t *off)
{
- uint64_t tmp;
+ const dtb_uint32_t *p;
addr = (const unsigned char *) addr + *off;
*off += size * 4;
@@ -357,8 +357,19 @@ static uint64_t read_cells(
case 1:
return be32toh(*(const dtb_uint32_t *) addr);
case 2:
- __builtin_memcpy(&tmp, addr, 8);
- return __builtin_bswap64(tmp);
+ /*
+ * DTB property data is only 4-byte aligned, but a
+ * naive 8-byte memcpy gets fused by the compiler
+ * into a single LDR Xt. With the MMU still off
+ * during early boot all memory is treated as
+ * Device-nGnRnE, where an 8-byte load against a
+ * 4-byte-aligned address takes an alignment fault.
+ * Read the two cells separately so the compiler is
+ * forced to emit 4-byte loads.
+ */
+ p = addr;
+ return ((uint64_t) be32toh(p[0]) << 32)
+ | be32toh(p[1]);
default:
panic("Unimplemented cell size: %d\n", size);
}
--
2.54.0