Hello Erick,

Erick Couts II wrote:
I wanted to point out that no SSE intrinsics were included in the source
code in order to vectorize the encoding process.  I've found that a small
but decent speed gain can be achieved by including the immintrin.h header
and then compiling with auto-vectorization enabled in GCC and LTO for
linktime.  I also profiled the program after running with the --best option
in order to further optimize the program.

What program? There are several programs in the lzip family.

I have tried your suggestion and I haven't noticed any increase in speed for '-9' in lzip-1.18 after including 'immintrin.h' (or 'ammintrin.h') and compiling with '-O3 -flto' on an AMD Athlon64 X2.

I know that you like to keep code simple, but just adding in the #include
immintrin.h to the headers will allow for auto-vectorization without
requiring further changes to any of the existing code.

I like to keep code simple and portable. For example, 'immintrin.h' does not exist in the computer from which I'm writing this.

But the problem with these optimization hacks is that they tend to not being reproducible in other environments. As seems to be the case for this one.

Best regards,

Lzip-bug mailing list

Reply via email to