On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote:
...
3. data flow analysis optimizations like constant propagation,
dead code elimination, register allocation, loop invariants,
etc.
Modern compilers (including dmd) do all three.
So if you're comparing code generated by dmd/gdc/ldc, and
notice something that dmd could do better at (1, 2 or 3),
please let me know. Often this sort of thing is low hanging
fruit that is fairly easily inserted into the back end.
...
I've once tried to trace the slowdown cause in a simple program
and reduced it to https://issues.dlang.org/show_bug.cgi?id=11821,
I think if falls under point 3 in your post (redundant
instruction in a simple loop). Despite 1.5 years have passed,
the issue still stands with 2.068.0.
That's for -m32. The -m64 version of the loop does not look as
having a redundant instruction to me, but is still longer than
the output of GDC or LDC.