On Monday, 11 July 2022 at 18:15:16 UTC, Ivan Kazmenko wrote:
Hi.

I'm looking at the compiler output of DMD (-O -release), LDC (-O -release), and GDC (-O3) for a simple array operation:

```
void add1 (int [] a)
{
    foreach (i; 0..a.length)
        a[i] += 1;
}
```

Here are the outputs: https://godbolt.org/z/GcznbjEaf

From what I gather at the view linked above, DMD does not use XMM registers for speedup, and does not unroll the loop either. Switching between 32bit and 64bit doesn't help either. However, I recall in the past it was capable of at least some of these optimizations. So, how do I enable them for such a function?

Ivan Kazmenko.

No, not in DMD. DMD generates what looks like 32 bit code adapted to x86_64. LDC may optimize this kind of loop with a tri-way branch depending on how many array elements remain. but it can both generate very good loop code(particularly when AVX-512 is available and the struct/data arrangement in memory is unfavorable for SIMD) and very questionable code. You may be losing performance for obscure reasons that look like gnomes decided to steal your precious cpu cycles and when that happens there is no way to fix it other than manually going in with a disassembler/debugger, changing defect optimizations in hot code paths to something faster then save back to executable file.(yikes, i know.)

Reply via email to