I was trying to show someone how awesome Dlang was earlier, and how the vector operations are expected to take advantage of the CPU vector instructions, and was dumbstruck when dmd and gdc both failed to auto-vectorize a simple case. I've stripped it down to the bare minimum and loaded the example on the interactive compiler: http://asm.dlang.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22JYWwDg9gTgLgBAY2gUwHQGdQBMDcAoPAMwBsIBDGAbQCYBWANgF05kAPM8Y5AQQAoTyVOkzhkANHAEUaDZgCMAlHgDeeOJNLThzBPnUB6fXCjJ0AV2Ix0cYADs45uenRrElZgF5R7uAFo4cu56xsgwZlD2ungAvgRSQrIs7JzIAEL8mgki4hqCMiKKKq7xAByUAMzUjABuZHBeCGToMBmCZZWMCmTBpRVV1XL1iE0tvR0Kcj2Z7f1R6q6GIeaW1nYOZnJgLurVCD5etT7%2BA0EE6iZhEcPNrVqyCrv40UAAA%3D%22%2C%22compiler%22%3A%22dmd2067%22%2C%22options%22%3A%22-O%20-release%20-inline%20-boundscheck%3Doff%22%7D%5D%7D

The reference documentation for arrays says:
Implementation note: many of the more common vector operations are expected to take advantage of any vector math instructions available on the target computer.

Does this mean that while compilers are expected to take advantage of them, they currently do not, even when they have proper alignment? I haven't tried LDC yet, so maybe LDC does perform auto-vectorization and I should attempt to use LDC if I plan on using vector ops a lot?

import core.simd;

float[256] exampleA(float[256] a, float[256] b)
{
  float[256] c;
  // results in subss (scalar instruction)
  c[] = a[] - b[];
  return c;
}

float[256] exampleB(float[256] a, float[256] b)
{
  float8[32]va = cast(float8[32])a;
  float8[32]vb = cast(float8[32])b;
  float8[32]vc;

  // results in subps (vector instruction)
  vc[] = va[] - vb[];

  return cast(float[256])vc;
}

Reply via email to