On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,
I was playing around with the intel-intrinsics library, trying
to improve the speed of a simple area function. I could not see
any performance improvements from the non-SIMD implementation.
The SIMD version is a little bit slower even with LDC2 and
--o3. Can anyone help me to understand what I am missing?
Thanks!
Bogdan
In your SIMD algorithm has not so many gain from using SIMD. The
length of the loop is the same.
Also probably compiler applying some optimizations in regular
versions, that doing almost the same.