> On 27 May 2026, at 15:43, Sergey Bugaev <[email protected]> wrote:
>
> Alright, I dug up my old laptop with the aarch64-gnu build tree, this
> might help answer some of the mysteries/uncertainties:
>
> zero_out_bss is indeed inlined into c_boot_entry, and memset doesn't
> save/restore x30 or x29 (or anything at all). So I was just lucky not
> to hit the issue.
>
> read_cells does a 8-byte load via 'ldr x0, [x0, x3]'. Perhaps neither
> QEMU nor the real hardware board that we were testing this on emulates
> that memory property?
>
> The check in thread_setstatus is 'ands x1, x2, #0xf; b.ne
> <thread_setstatus +124>', so it does expect the alignment of 16.
That makes sense. I thought that could be the case. Good to know.
>> That moves me away from the kernel-side copy workaround and
>> towards relaxing the struct instead. Replacing __int128 v[32]
>> with int64_t v[64] in aarch64_float_state drops alignof from 16 to
>> 8, which matches what MIG actually delivers (+56 from the message
>> body, 8-aligned), and the direct cast in thread_setstatus becomes
>> well-defined with no workaround needed. Your own db8dacb5 flagged
>> the __int128 portability concern so it feels like a soft landing.
>> No public aarch64 ABI consumer yet, so the break seems acceptable,
>> but happy to hear if either of you sees a reason to prefer the
>> kernel-side copy.
>
> Reducing the alignment requirement for the struct sounds reasonable, but:
> 1. We still want v[4] to refer to the v4 register, not to a half of v2
> :) So maybe try something like 'struct { int64_t lower, upper; }
> v[64];' instead?
> 2. What are the alignment requirements of the stp/ldp instructions
> that _fpu_{save,load}_state implementations use? The callees should
> ensure they provide the required alignment.
That’s a good approach, if we want to be fancy we could even have a union
to allow the usage of the multiple types of registers. If any consumers or the
kernel needs to do arithmetic or atomic operations with the __int128 we could
easily provide a macro to force the inline as needed.
I don’t think most of the use cases will need to access the registers, so
forcing
the alignment every time just for the sake of it seems wasteful, alignment on
use
seems the best approach on my opinion.
Paulo