I stumped upon optimized assembler decompression routine that can be compiled under linux with nasm or gas. In 18.03 version of sdk lzma's author introduced x86_64 assembler version of LzmaDec_DecodeReal_3() function that makes decompression around 30% faster in comparison to C version. This function had been adapted to nasm by Peter Hyman in his `lrzip-next' project[1]. There is also gas version made by Conor McCarthy in his `fxz'[2] and `fast-lzma2'[3]. Original, masm, version can be compiled with asmc[4] and is used in new p7zip stage[5].
So I thought if this optimization can improve lzma decompression in those project it can do it as well in lzip. I have tested it (with fxz to be more precise) and it really is much faster (about 30%-32%). So I propose to take a look into it as it is bit of attention. [1] https://github.com/pete4abw/lrzip-next/raw/master/src/lzma/ASM/x86/LzmaDecOpt.asm [2] https://github.com/conor42/fxz/raw/master/src/liblzma/lzma/lzma_dec_x86_64.S; (I think this one is the most promissing as gcc/gas candidate) [3] https://github.com/conor42/fast-lzma2/raw/master/lzma_dec_x86_64.S, https://github.com/conor42/fast-lzma2/raw/master/lzma_dec_x86_64.asm [4] https://github.com/nidud/asmc [5] https://github.com/jinfeihan57/p7zip/raw/7zip_21.02/Asm/x86/LzmaDecOpt.asm