On Friday, 3 November 2023 at 15:32:08 UTC, Sergey wrote:
On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,
I was playing around with the intel-intrinsics library, trying
to improve the speed of a simple area function. I could not
see any performance improvements from the non-SIMD
implementation. The SIMD version is a little bit slower even
with LDC2 and --o3. Can anyone help me to understand what I am
missing?
Thanks!
Bogdan
In your SIMD algorithm has not so many gain from using SIMD.
The length of the loop is the same.
Also probably compiler applying some optimizations in regular
versions, that doing almost the same.
I think it was from the way I was running the benchmark:
```d
////
auto begin = Clock.currTime;
foreach (i; 0..100_000) {
res1 = areaMeters(polygon);
}
writeln("No SIMD ", Clock.currTime - begin);
////
begin = Clock.currTime;
foreach (i; 0..100_000) {
res2 = areaMetersSimd2(polygon);
}
writeln("SIMD ", Clock.currTime - begin);
```
gives me:
```
No SIMD 1 sec, 80 ms, 765 μs, and 1 hnsec
SIMD 1 sec, 120 ms, 765 μs, and 1 hnsec
```
```d
////
auto begin = Clock.currTime;
res1 = areaMeters(polygon);
writeln("No SIMD ", Clock.currTime - begin);
////
begin = Clock.currTime;
res2 = areaMetersSimd2(polygon);
writeln("SIMD ", Clock.currTime - begin);
```
gives me:
```
No SIMD 19 μs and 3 hnsecs
SIMD 16 μs and 8 hnsecs
```