On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,

I was playing around with the intel-intrinsics library, trying to improve the speed of a simple area function. I could not see any performance improvements from the non-SIMD implementation. The SIMD version is a little bit slower even with LDC2 and --o3. Can anyone help me to understand what I am missing?

Thanks!
Bogdan

In your SIMD algorithm has not so many gain from using SIMD. The length of the loop is the same. Also probably compiler applying some optimizations in regular versions, that doing almost the same.

Reply via email to