Re: vectorization of a simple loop -- not in DMD?

z via Digitalmars-d-learn Thu, 14 Jul 2022 02:01:03 -0700

On Monday, 11 July 2022 at 18:15:16 UTC, Ivan Kazmenko wrote:

Hi.
I'm looking at the compiler output of DMD (-O -release), LDC(-O -release), and GDC (-O3) for a simple array operation:
```
void add1 (int [] a)
{
    foreach (i; 0..a.length)
        a[i] += 1;
}
```

Here are the outputs: https://godbolt.org/z/GcznbjEaf
From what I gather at the view linked above, DMD does not useXMM registers for speedup, and does not unroll the loop either.Switching between 32bit and 64bit doesn't help either.However, I recall in the past it was capable of at least someof these optimizations. So, how do I enable them for such afunction?
Ivan Kazmenko.

No, not in DMD. DMD generates what looks like 32 bit code adaptedto x86_64.LDC may optimize this kind of loop with a tri-way branchdepending on how many array elements remain. but it can bothgenerate very good loop code(particularly when AVX-512 isavailable and the struct/data arrangement in memory isunfavorable for SIMD) and very questionable code.You may be losing performance for obscure reasons that look likegnomes decided to steal your precious cpu cycles and when thathappens there is no way to fix it other than manually going inwith a disassembler/debugger, changing defect optimizations inhot code paths to something faster then save back to executablefile.(yikes, i know.)

Re: vectorization of a simple loop -- not in DMD?

Reply via email to