[Lzip-bug] Speedup by including intrinsics for vectorization

Erick Couts II Thu, 13 Oct 2016 03:53:51 -0700

Hello,
I wanted to point out that no SSE intrinsics were included in the source
code in order to vectorize the encoding process.  I've found that a small
but decent speed gain can be achieved by including the immintrin.h header
and then compiling with auto-vectorization enabled in GCC and LTO for
linktime.  I also profiled the program after running with the --best option
in order to further optimize the program.  The resulting gains were 57 sec
with optimization to 69 sec without on a 134 MB file (contents of the MS
Reserved Partition passed through dd).  I would recommend looking into
adding the intrinsic header so as to allow GCC to automatically optimize
the compilation based upon what CPU is in use.  Including a header for a
later CPU will not add intrinsics which the CPU cannot handle to the
program.
While I have seen a speed increase, it did increase the size of the final
binary by about 4 KB as well.
I know that you like to keep code simple, but just adding in the #include
immintrin.h to the headers will allow for auto-vectorization without
requiring further changes to any of the existing code.
Anyway, hope this helps!

_______________________________________________
Lzip-bug mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lzip-bug

[Lzip-bug] Speedup by including intrinsics for vectorization

Reply via email to