https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412
Dmitry Kazakov <dimula73 at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dimula73 at gmail dot com --- Comment #25 from Dmitry Kazakov <dimula73 at gmail dot com> --- Hi, all! I would like to add one more test file, related to the problem. If GCC tries to call a function, that accepts a __m256 register as a parameter, it unloads this parameter into the stack using an **aligned** move (vmovaps), but the alignment guarantee on Windows is only 16-byte. It means that the application will crash because of unaligned memory access. Affected versions: GCC 7.3.0 (MinGW64), GCC 8.1.0 (MinGW64) Here is the testing source (see also in an attachment): #include <intrin.h> struct X { alignas(32) __m256 d; }; void g1(X); void g2(const X&); void g3(const void *); void f(float *ptr) { X x = {_mm256_load_ps(ptr)}; g1(x); // BUG: passes via unaligned (whatever rsp alignment is) stack g2(x); // OK: passes via aligned stack location g3(&x); // OK: passes via aligned stack location } Compiled result (-O2 -march=skylake): _Z1fPf: .LFB5135: pushq %rbx .seh_pushreg %rbx addq $-128, %rsp .seh_stackalloc 128 .seh_endprologue vmovaps (%rcx), %ymm0 leaq 95(%rsp), %rbx leaq 32(%rsp), %rcx andq $-32, %rbx vmovaps %ymm0, (%rbx) # %rbx is properly aligned vmovaps %ymm0, 32(%rsp) # %rsp may be unaligned vzeroupper call _Z2g11X movq %rbx, %rcx call _Z2g2RK1X movq %rbx, %rcx call _Z2g3PKv nop subq $-128, %rsp popq %rbx ret Related bug in Vc library: https://github.com/VcDevel/Vc/issues/241 Related bug in Krita: https://bugs.kde.org/show_bug.cgi?id=406209