I'll try to be concise: The stack on x64 is 16-byte aligned,
enough for SSE registers, but not the 32-byte AVX registers.
Any data structure containing AVX registers, cannot be
guaranteed to be correctly aligned on the stack, but we get no
warning if we try anyways:

align(32) struct Matrix4x4 {
    float[4][4] m;

void main() {
    import core.simd;
    Matrix4x4 matrix;  // No warning
    float8 vector;     // No warning

Now some people use align(64) just as a performance hint, for
example to have a 64-byte data structure fill 1 cache-line
exactly (and for all the other things like C interop, file
alignment, etc.). On the other hand AVX is the first
instruction set that makes use of alignments above 16 so the
game has changed and will continue to do so with future x86
SIMD extensions.

Perspective A:

We now have "authorative" alignments that must be honored with
explicit warnings/errors if not, and the status-quo: alignment
hints that should be honored, but are silently ignored on the
stack. The language could express this with an imagined
"forcealign(32)" attribute, which disallows placing such
data structures on the 16-byte aligned stack. ("forcealign"
naturally overrides any smaller "align" attribute.)

Perspective B:

AVX vectors should generally be assumed to be unaligned.
Unlike SSE, all but the "aligned load" instructions work
with unaligned memory operands and the potential speed
penalty. Aligned loads could be replaced with unaligned loads
and the code would work again. But as compiler intrinsics
continue to emit aligned loads for SIMD, this only works for
AVX code written in asm - intrinsics continue to be a
heisen-bug mine field.



Reply via email to