https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218203

--- Comment #1 from [email protected] ---
If desired, I can post my benchmark code.  It is using more instructions than
the zfsonlinux variant (I used SIMD intrinsics instead of inline assembly). 
The extra instructions are mostly just shuffling values between registers. 
After the intermediate sum loop is completed I aliased into the __m256i's
instead of doing vmovqdu into memory for the constant multiplications.  I
suspect the compiler was able to shuffle registers around enough to avoid some
trips to memory, but the Intel whitepaper isn't quite fair to itself, as I
think they are comparing the best possible performance without SIMD (which is
not the original loop, but the loop unrolled 4 times) with their SIMD variant.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "[email protected]"

Reply via email to