On Thursday, 30 September 2021 at 16:40:03 UTC, james.p.leblanc
wrote:
D-Ers,
I have been getting counterintuitive results on avx/no-avx
timing
experiments.
This could be an template instantiation culling problem. If the
compiler is able to determine that `Complex!float` is already
instantiated (codegen) inside Phobos, then it may decide not to
codegen it again when you are compiling your code with
AVX+fastmath enabled. This could explain why you don't see
improvement for `Complex!float`, but do see improvement with
`Complex!double`. This does not explain the worse performance
with AVX+fastmath vs without it.
Generally, for performance issues like this you need to study
assembly output (`--output-s`) or LLVM IR (`--output-ll`).
First thing I would look out for is function inlining yes/no.
cheers,
Johan