Re: [Lzip-bug] Speedup by including intrinsics for vectorization

Antonio Diaz Diaz Thu, 13 Oct 2016 10:27:06 -0700

Hello Erick,

Erick Couts II wrote:

I wanted to point out that no SSE intrinsics were included in the source
code in order to vectorize the encoding process.  I've found that a small
but decent speed gain can be achieved by including the immintrin.h header
and then compiling with auto-vectorization enabled in GCC and LTO for
linktime.  I also profiled the program after running with the --best option
in order to further optimize the program.


What program? There are several programs in the lzip family.

I have tried your suggestion and I haven't noticed any increase in speedfor '-9' in lzip-1.18 after including 'immintrin.h' (or 'ammintrin.h')and compiling with '-O3 -flto' on an AMD Athlon64 X2.

I know that you like to keep code simple, but just adding in the #include
immintrin.h to the headers will allow for auto-vectorization without
requiring further changes to any of the existing code.

I like to keep code simple and portable. For example, 'immintrin.h' doesnot exist in the computer from which I'm writing this.

But the problem with these optimization hacks is that they tend to notbeing reproducible in other environments. As seems to be the case forthis one.



Best regards,
Antonio.

_______________________________________________
Lzip-bug mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lzip-bug

Re: [Lzip-bug] Speedup by including intrinsics for vectorization

Reply via email to