On 24 May 2026, at 02:43, Paulo Duarte <[email protected]> wrote:
> 
> Two latent bugs in the aarch64 FPU state plumbing, both surfaced
> the first time an aarch64 thread_get_state / thread_set_state
> caller exercised AARCH64_FLOAT_STATE.

Call me old fashioned, but have one commit per bug?

> 1. Field-order bug in _fpu_save_state / _fpu_load_state
> ============================================================
> 
> struct aarch64_float_state in <mach/aarch64/thread_status.h> lays
> out fpsr at the first offset after v[32], with fpcr second:
> 
>    struct aarch64_float_state {
>        __int128 v[32];
>        uint64_t fpsr;          /* offset +512 */
>        uint64_t fpcr;          /* offset +520 */
>        uint64_t fpmr;
>        uint64_t fp_reserved;
>    };
> 
> The save/load asm in aarch64/aarch64/locore.S read and wrote them in
> the opposite order — store FPCR into the fpsr slot and FPSR into the
> fpcr slot — which silently corrupted both fields across every
> thread_get_state / thread_set_state cycle.
> 
> A caller that loads a known FPCR via `msr fpcr, ...` and reads it
> back through thread_get_state(AARCH64_FLOAT_STATE) sees 0 instead,
> because the value was stored at the fpsr offset.
> 
> 2. Alignment-check bug in thread_setstatus
> ============================================================
> 
> thread_setstatus() in aarch64/aarch64/pcb.c rejected
> AARCH64_FLOAT_STATE writes whose tstate pointer wasn't aligned to
> alignof(struct aarch64_float_state).  The struct's first member is
> __int128 v[32], giving the type a 16-byte alignment requirement.
> 
> The MIG-generated _Xthread_set_state stub places `new_state[]` at
> offset 40 from the request message header, which is only 8-byte
> aligned.  Even when the user-side buffer is perfectly aligned, the
> in-kernel buffer the stub hands to thread_setstatus is at the
> fixed offset 40 — so the alignment check rejected every legitimate
> AARCH64_FLOAT_STATE write with KERN_INVALID_ARGUMENT.
> 
> Copy the state into a stack-local that has the natural type
> alignment before validating and storing.  This is also what would
> have to happen if we wanted to read __int128 from the buffer
> directly on a strict-alignment-trapping config; the byte-by-byte
> copy handles the under-aligned source without faulting.

Would it not be more appropriate to fix MIG to respect alignment
requirements rather than hack around it by ignoring the unaligned data?

Jessica


Reply via email to