https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124697
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 31 Mar 2026, hjl.tools at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124697 > > --- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> --- > (In reply to Richard Biener from comment #4) > > (In reply to H.J. Lu from comment #3) > > > [hjl@gnu-tgl-3 pr124697]$ cat foo.c > > > typedef double v4df __attribute__((vector_size(32))); > > > typedef double v2df __attribute__((vector_size(16))); > > > typedef struct { > > > v2df a[2]; > > > } c __attribute__((aligned(32))); > > > extern v4df d; > > > void > > > e (float a1, float a2, float a3, float a4, float a5, float a6, c f) > > > { > > > d = *(v4df *) &f; > > > } > > > [hjl@gnu-tgl-3 pr124697]$ make foo.s > > > /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc > > > -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ > > > -O2 > > > -march=x86-64-v4 -S foo.c > > > [hjl@gnu-tgl-3 pr124697]$ cat foo.s > > > .file "foo.c" > > > .text > > > .p2align 4 > > > .globl e > > > .type e, @function > > > e: > > > .LFB0: > > > .cfi_startproc > > > pushq %rbp > > > .cfi_def_cfa_offset 16 > > > .cfi_offset 6, -16 > > > movq %rsp, %rbp > > > .cfi_def_cfa_register 6 > > > vmovapd 16(%rbp), %ymm0 <<<<<<< f is aligned at 16 bytes. > > > > Yes. This is wrong code. My patch would have fixed it, doing > > effectively (but restricted to x86 at this point) > > > > diff --git a/gcc/function.cc b/gcc/function.cc > > index 46c0d8b54c2..d44815afc16 100644 > > --- a/gcc/function.cc > > +++ b/gcc/function.cc > > @@ -2840,7 +2840,7 @@ assign_parm_adjust_stack_rtl (tree parm, struct > > assign_parm_data_one *data) > > MEM_ALIGN (stack_parm)))) > > || (data->nominal_type > > && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm) > > - && (MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY > > + && (MEM_ALIGN (stack_parm) < BIGGEST_ALIGNMENT > > What happens if BIGGEST_ALIGNMENT > PREFERRED_STACK_BOUNDARY and > BIGGEST_ALIGNMENT > MAX_SUPPORTED_STACK_ALIGNMENT. The latter would be an unsupported config. But what _actually_ happens, like what you see on aarch64 is that we then allocate an aligned stack slot not by re-aligning the stack pointer but by alloca-like code, rounding up size and then using the aligned portion of the slot. IIRC only x86 can do re-alignment of the stack pointer at entry/exit.
