The XMM registers I am using are efficient when you feed them memory from arrays aligned to 16 bytes, as the D GC produces. But the YMM registers used by the AVX/AVX2 instructions prefer an alignment of 32 bytes. And the Intel Xeon Phi (MIC) has XMM registers that are efficient when the arrays are aligned to 64 bytes.

When I am not using SIMD code, and I want a small array of little elements, like an array of 10 ushorts, having it aligned to 16 bytes is a waste of space (despite helps the GC reduce the fragmentation).

So I have written a small enhancement request, where I suggest that arrays for YMM registers could be allocated with an alignment of 32 bytes:
http://d.puremagic.com/issues/show_bug.cgi?id=10826

Having the array alignments in the D type system could be useful. To be backward-compatible you also need a generic unknown alignment (like a void* for alignments), so you can assign arrays of any alignment to it, it could be denoted with '0'.

Some rough ideas:


import core.simd: double2, double4;
auto a = new int[10];
static assert(__traits(alignment, a) == 16);
auto b = new int[128]<32>;
static assert(__traits(alignment, b) == 32);
auto c1 = new double2[128];
auto c2 = new double4[64];
static assert(__traits(alignment, c1) == 16);
static assert(__traits(alignment, c2) == 32);

void foo1(int[]<32> a) {
    // Uses YMM registers to modify a
    // ...
}

void foo2(int[] a)
if (__traits(alignment, a) == 32) {
    // Uses YMM registers to modify a
    // ...
}

void foo3(size_t N)(int[]<N> a) {
    static if (N >= 32) {
        // Uses YMM registers to modify a
        // ...
    } else {
        // ...
    }
}


Bye,
bearophile

Reply via email to