Alright, I dug up my old laptop with the aarch64-gnu build tree, this might help answer some of the mysteries/uncertainties:
zero_out_bss is indeed inlined into c_boot_entry, and memset doesn't save/restore x30 or x29 (or anything at all). So I was just lucky not to hit the issue. read_cells does a 8-byte load via 'ldr x0, [x0, x3]'. Perhaps neither QEMU nor the real hardware board that we were testing this on emulates that memory property? The check in thread_setstatus is 'ands x1, x2, #0xf; b.ne <thread_setstatus +124>', so it does expect the alignment of 16. On Wed, May 27, 2026 at 12:09 AM Paulo Fernando Barbosa Duarte <[email protected]> wrote: > > > > On 26 May 2026, at 14:50, Sergey Bugaev <[email protected]> wrote: > > > > But that answers neither of my two questions :) > > Sergey, sorry for the runaround. Digging through the source now. > > MIG's inline vs out-of-line is decided at type-declaration time, > not at runtime. In mig/type.c around line 632 (itVarArrayDecl), > only the unbounded `array[*]` form sets itIndefinite, which is the > flag that allows the switch to out-of-line above ~2048 bytes. > `array[*:1024]` keeps a fixed bound so it stays inline regardless > of new_state_count. The generated _Xthread_set_state confirms it > from the other side, it rejects msgt_inline != TRUE with > MIG_BAD_ARGUMENTS and has the +56 offset hardcoded in the size > check. So there is no MIG knob to flip this RPC to out-of-line. I see, indeed. Thank you. > On the user-page question, my earlier intuition was off. On LP64 > copyinmsg in gnumach/ipc/copy_user.c does a plain copyin of > the whole user message into a kalloc'd kmsg, so the new_state > pointer the stub passes to thread_set_state is into that kmsg, not > into the user's pages. Even on the genuine out-of-line path > (ipc_kmsg.c around line 1442), vm_map_copyin produces a kernel-side > vm_map_copy_t, the kernel never holds a raw user pointer. Right; I remember there was some special API at the intersection of the VM and IPC subsystems that the kernel routines had to call to access the OOL memory. But I don't of course remember any deatils. > That moves me away from the kernel-side copy workaround and > towards relaxing the struct instead. Replacing __int128 v[32] > with int64_t v[64] in aarch64_float_state drops alignof from 16 to > 8, which matches what MIG actually delivers (+56 from the message > body, 8-aligned), and the direct cast in thread_setstatus becomes > well-defined with no workaround needed. Your own db8dacb5 flagged > the __int128 portability concern so it feels like a soft landing. > No public aarch64 ABI consumer yet, so the break seems acceptable, > but happy to hear if either of you sees a reason to prefer the > kernel-side copy. Reducing the alignment requirement for the struct sounds reasonable, but: 1. We still want v[4] to refer to the v4 register, not to a half of v2 :) So maybe try something like 'struct { int64_t lower, upper; } v[64];' instead? 2. What are the alignment requirements of the stp/ldp instructions that _fpu_{save,load}_state implementations use? The callees should ensure they provide the required alignment. Sergey
