Erick Couts II wrote:
I wanted to point out that no SSE intrinsics were included in the source
code in order to vectorize the encoding process. I've found that a small
but decent speed gain can be achieved by including the immintrin.h header
and then compiling with auto-vectorization enabled in GCC and LTO for
linktime. I also profiled the program after running with the --best option
in order to further optimize the program.
What program? There are several programs in the lzip family.
I have tried your suggestion and I haven't noticed any increase in speed
for '-9' in lzip-1.18 after including 'immintrin.h' (or 'ammintrin.h')
and compiling with '-O3 -flto' on an AMD Athlon64 X2.
I know that you like to keep code simple, but just adding in the #include
immintrin.h to the headers will allow for auto-vectorization without
requiring further changes to any of the existing code.
I like to keep code simple and portable. For example, 'immintrin.h' does
not exist in the computer from which I'm writing this.
But the problem with these optimization hacks is that they tend to not
being reproducible in other environments. As seems to be the case for
Lzip-bug mailing list