On Thursday, 30 September 2021 at 16:40:03 UTC, james.p.leblanc wrote:
D-Ers,

I have been getting counterintuitive results on avx/no-avx timing
experiments.

This could be an template instantiation culling problem. If the compiler is able to determine that `Complex!float` is already instantiated (codegen) inside Phobos, then it may decide not to codegen it again when you are compiling your code with AVX+fastmath enabled. This could explain why you don't see improvement for `Complex!float`, but do see improvement with `Complex!double`. This does not explain the worse performance with AVX+fastmath vs without it.

Generally, for performance issues like this you need to study assembly output (`--output-s`) or LLVM IR (`--output-ll`).
First thing I would look out for is function inlining yes/no.

cheers,
  Johan

Reply via email to