I wanted to point out that no SSE intrinsics were included in the source
code in order to vectorize the encoding process.  I've found that a small
but decent speed gain can be achieved by including the immintrin.h header
and then compiling with auto-vectorization enabled in GCC and LTO for
linktime.  I also profiled the program after running with the --best option
in order to further optimize the program.  The resulting gains were 57 sec
with optimization to 69 sec without on a 134 MB file (contents of the MS
Reserved Partition passed through dd).  I would recommend looking into
adding the intrinsic header so as to allow GCC to automatically optimize
the compilation based upon what CPU is in use.  Including a header for a
later CPU will not add intrinsics which the CPU cannot handle to the
While I have seen a speed increase, it did increase the size of the final
binary by about 4 KB as well.
I know that you like to keep code simple, but just adding in the #include
immintrin.h to the headers will allow for auto-vectorization without
requiring further changes to any of the existing code.
Anyway, hope this helps!

Lzip-bug mailing list

Reply via email to