On 24 May 2026, at 02:43, Paulo Duarte <[email protected]> wrote:
>
> Two latent bugs in the aarch64 FPU state plumbing, both surfaced
> the first time an aarch64 thread_get_state / thread_set_state
> caller exercised AARCH64_FLOAT_STATE.
Call me old fashioned, but have one commit per bug?
> 1. Field-order bug in _fpu_save_state / _fpu_load_state
> ============================================================
>
> struct aarch64_float_state in <mach/aarch64/thread_status.h> lays
> out fpsr at the first offset after v[32], with fpcr second:
>
> struct aarch64_float_state {
> __int128 v[32];
> uint64_t fpsr; /* offset +512 */
> uint64_t fpcr; /* offset +520 */
> uint64_t fpmr;
> uint64_t fp_reserved;
> };
>
> The save/load asm in aarch64/aarch64/locore.S read and wrote them in
> the opposite order — store FPCR into the fpsr slot and FPSR into the
> fpcr slot — which silently corrupted both fields across every
> thread_get_state / thread_set_state cycle.
>
> A caller that loads a known FPCR via `msr fpcr, ...` and reads it
> back through thread_get_state(AARCH64_FLOAT_STATE) sees 0 instead,
> because the value was stored at the fpsr offset.
>
> 2. Alignment-check bug in thread_setstatus
> ============================================================
>
> thread_setstatus() in aarch64/aarch64/pcb.c rejected
> AARCH64_FLOAT_STATE writes whose tstate pointer wasn't aligned to
> alignof(struct aarch64_float_state). The struct's first member is
> __int128 v[32], giving the type a 16-byte alignment requirement.
>
> The MIG-generated _Xthread_set_state stub places `new_state[]` at
> offset 40 from the request message header, which is only 8-byte
> aligned. Even when the user-side buffer is perfectly aligned, the
> in-kernel buffer the stub hands to thread_setstatus is at the
> fixed offset 40 — so the alignment check rejected every legitimate
> AARCH64_FLOAT_STATE write with KERN_INVALID_ARGUMENT.
>
> Copy the state into a stack-local that has the natural type
> alignment before validating and storing. This is also what would
> have to happen if we wanted to read __int128 from the buffer
> directly on a strict-alignment-trapping config; the byte-by-byte
> copy handles the under-aligned source without faulting.
Would it not be more appropriate to fix MIG to respect alignment
requirements rather than hack around it by ignoring the unaligned data?
Jessica