Alright, I dug up my old laptop with the aarch64-gnu build tree, this
might help answer some of the mysteries/uncertainties:

zero_out_bss is indeed inlined into c_boot_entry, and memset doesn't
save/restore x30 or x29 (or anything at all). So I was just lucky not
to hit the issue.

read_cells does a 8-byte load via 'ldr x0, [x0, x3]'. Perhaps neither
QEMU nor the real hardware board that we were testing this on emulates
that memory property?

The check in thread_setstatus is 'ands x1, x2, #0xf; b.ne
<thread_setstatus +124>', so it does expect the alignment of 16.

On Wed, May 27, 2026 at 12:09 AM Paulo Fernando Barbosa Duarte
<[email protected]> wrote:
>
>
> > On 26 May 2026, at 14:50, Sergey Bugaev <[email protected]> wrote:
> >
> > But that answers neither of my two questions :)
>
> Sergey, sorry for the runaround. Digging through the source now.
>
> MIG's inline vs out-of-line is decided at type-declaration time,
> not at runtime. In mig/type.c around line 632 (itVarArrayDecl),
> only the unbounded `array[*]` form sets itIndefinite, which is the
> flag that allows the switch to out-of-line above ~2048 bytes.
> `array[*:1024]` keeps a fixed bound so it stays inline regardless
> of new_state_count. The generated _Xthread_set_state confirms it
> from the other side, it rejects msgt_inline != TRUE with
> MIG_BAD_ARGUMENTS and has the +56 offset hardcoded in the size
> check. So there is no MIG knob to flip this RPC to out-of-line.

I see, indeed. Thank you.

> On the user-page question, my earlier intuition was off. On LP64
> copyinmsg in gnumach/ipc/copy_user.c does a plain copyin of
> the whole user message into a kalloc'd kmsg, so the new_state
> pointer the stub passes to thread_set_state is into that kmsg, not
> into the user's pages. Even on the genuine out-of-line path
> (ipc_kmsg.c around line 1442), vm_map_copyin produces a kernel-side
> vm_map_copy_t, the kernel never holds a raw user pointer.

Right; I remember there was some special API at the intersection of
the VM and IPC subsystems that the kernel routines had to call to
access the OOL memory. But I don't of course remember any deatils.

> That moves me away from the kernel-side copy workaround and
> towards relaxing the struct instead. Replacing __int128 v[32]
> with int64_t v[64] in aarch64_float_state drops alignof from 16 to
> 8, which matches what MIG actually delivers (+56 from the message
> body, 8-aligned), and the direct cast in thread_setstatus becomes
> well-defined with no workaround needed. Your own db8dacb5 flagged
> the __int128 portability concern so it feels like a soft landing.
> No public aarch64 ABI consumer yet, so the break seems acceptable,
> but happy to hear if either of you sees a reason to prefer the
> kernel-side copy.

Reducing the alignment requirement for the struct sounds reasonable, but:
1. We still want v[4] to refer to the v4 register, not to a half of v2
:) So maybe try something like 'struct { int64_t lower, upper; }
v[64];' instead?
2. What are the alignment requirements of the stp/ldp instructions
that _fpu_{save,load}_state implementations use? The callees should
ensure they provide the required alignment.

Sergey

Reply via email to