A few things:

1. It is 150% of the speed, not 150% faster. That means that it is only
50% faster. Also, this is an average. Real world decompression speed
gains seem to be in the range of 35% to 55% faster if you look at each
of the files in the silesia corpus. 50% is the rough average, but only
for -O3. -O2 is slightly slower.

2. The table is garbled. Here are the results from my Intel Core 2 Quad
9550 on an uncompressed tar archive of the Silesia corpus:

$ ./fullbench -d /tmp/silesia.tar
*** LZ4/LZJB speed analyzer  64-bits, by Yann Collet (with LZJB hacks by
Strontium) (Oct 24 2013) ***
 /tmp/silesia.tar :

LZ4_decompress_fast        : 211957760 ->  1016.6 MB/s
LZ4_decompress_fast_withPr : 211957760 ->  1017.4 MB/s
LZ4_decompress_safe        : 211957760 ->   967.8 MB/s
LZ4_decompress_safe_withPr : 211957760 ->   967.1 MB/s
LZ4_decompress_safe_partia : 211957760 ->   965.6 MB/s
ZFS lzjb_decompress        : 211957760 ->   339.3 MB/s
BSD lzjb_decompress        : 211957760 ->   307.4 MB/s
HAX lzjb_decompress        : 211957760 ->   501.5 MB/s
 ** TOTAL ** :
LZ4_decompress_fast   : 211957760 -> 1016.6 MB/s
LZ4_decompress_fast_w : 211957760 -> 1017.4 MB/s
LZ4_decompress_safe   : 211957760 ->  967.8 MB/s
LZ4_decompress_safe_w : 211957760 ->  967.1 MB/s
LZ4_decompress_safe_p : 211957760 ->  965.6 MB/s
ZFS lzjb_decompress   : 211957760 ->  339.3 MB/s
BSD lzjb_decompress   : 211957760 ->  307.4 MB/s
HAX lzjb_decompress   : 211957760 ->  501.5 MB/s

That is at -O3. Here is -O2:

$ ./fullbenchO2 -d /tmp/silesia.tar
*** LZ4/LZJB speed analyzer  64-bits, by Yann Collet (with LZJB hacks by
Strontium) (Oct 24 2013) ***
 /tmp/silesia.tar :

LZ4_decompress_fast        : 211957760 ->  1015.0 MB/s
LZ4_decompress_fast_withPr : 211957760 ->  1016.6 MB/s
LZ4_decompress_safe        : 211957760 ->   972.7 MB/s
LZ4_decompress_safe_withPr : 211957760 ->   995.1 MB/s
LZ4_decompress_safe_partia : 211957760 ->   972.3 MB/s
ZFS lzjb_decompress        : 211957760 ->   340.0 MB/s
BSD lzjb_decompress        : 211957760 ->   311.8 MB/s
HAX lzjb_decompress        : 211957760 ->   478.6 MB/s
 ** TOTAL ** :
LZ4_decompress_fast   : 211957760 -> 1015.0 MB/s
LZ4_decompress_fast_w : 211957760 -> 1016.6 MB/s
LZ4_decompress_safe   : 211957760 ->  972.7 MB/s
LZ4_decompress_safe_w : 211957760 ->  995.1 MB/s
LZ4_decompress_safe_p : 211957760 ->  972.3 MB/s
ZFS lzjb_decompress   : 211957760 ->  340.0 MB/s
BSD lzjb_decompress   : 211957760 ->  311.8 MB/s
HAX lzjb_decompress   : 211957760 ->  478.6 MB/s

Interestingly, the other decompressors are faster at -O2 than at -O3
while Steven's is faster at -O3 than at -O2. It might be possible to
obtain the -O3 performance at -O2 by prefixing the function with
something like:

#ifdef __GNUC__
__attribute__((optimize("unroll-loops")))
#endif

On 10/24/2013 11:15 AM, Strontium wrote:
> Hi all,
> 
> After a conversation on IRC with Ryao about lzjb performance and the 
> proposed BSD version  LZJB decompressor.  I decided to modify the lz4 
> benchmark code and wedge in lzjb from ZFS to compare them.
> 
> I have published code and the result here: 
> https://github.com/stevenj/lzjbbench
> 
> In the process i hacked up an experimental lzjb decompression 
> implementation.  It is not based on the existing code, its from scratch 
> decoding of the bit stream.
> 
> In the results my decoder is identified as "HAX_lzjb_decompress"
> 
> Sample results:
> *ALGORITHM**FILE NAME**FILE SIZE**COMPRESSED SIZE**BLOCK SIZE**MB/s**DIFF*
> HAX_lzjb_decompressenwik8100000000687210361048576443.8133.71%
> ZFS_lzjb_decompressenwik8100000000786363371024331.9
> HAX_lzjb_decompresssilesia.zip681827447652923510242635579.50%
> ZFS_lzjb_decompresssilesia.zip68182744764865714194304454.7
> HAX_lzjb_decompressmozilla51220480298534044096616.9150.68%
> ZFS_lzjb_decompressmozilla51220480288685914194304409.4
> HAX_lzjb_decompresswebster41458703265665964096466.6138.37%
> ZFS_lzjb_decompresswebster41458703301354651024337.2
> HAX_lzjb_decompressenwik8.zip364454754098524010485762792.3614.64%
> ZFS_lzjb_decompressenwik8.zip364454754098548965536454.3
> HAX_lzjb_decompressnci33553445110884971024736.7120.91%BSD_lzjb_decompressnci
> 3355344587148924194304609.3
> 
> Each of these is my algorithms WORST result vs the alternatives BEST.
> This is built with -O3 and run on a AMD FX 8150 and is pure C.
> 
> My github has the full spreadsheet with all the data if anyone is 
> interested.
> 
> Things i would like to qualify.  My algorithm has had no substantial speed 
> tweaking, its just a first attempt at a faster method.
> It primarily works by overcopying and using 8 byte transfers wherever 
> possible.  Basically, the theory is its just as expensive to write one byte 
> to memory as it is to write 8 (at least on a 64bit machine), so i write 8 
> and then adjust the pointers (which are cheap register operations).  But it 
> also picks up some easy to optimize corner cases as well, which is why it 
> performs so well on decompressing un-compressable data. I know there is 
> room for improvement still.
> Its hacky and i haven't cleaned it, its a single days coding, so i am sure 
> it can be a lot nicer.  
> 
> The LZ4 test suite is good, it try's to, as much as it can, test ONLY the 
> speed of decompression or compression and to eliminate IO.  This is good, 
> because IO is a variable but the efficiency of the algorithm is not.  An 
> inefficient algorithm may look much better than it really is if slow IO is 
> allowed to cloud the result.
> 
> I adapted the benchmark code to make it more useful for me when testing new 
> algorithms.
> 
> I also tested the new changes to lzjb decompression BSD made.  Except in 
> very few cases, in this test, classical lzjb beats it.  nci above is one 
> case where the BSD one beats it.   My experimental decoder beats them both 
> by a long margin.
> 
> I also believe LZJB compression should be able to be made significantly 
> faster.  Experiments in that regard are on my"todo" list.
> 
> Ideally when this is clean i would propose it or an improved successor as a 
> replacement or supplement to the existing implementation of lzjb 
> decompression.
> 
> Steven (Strontium)
> 
> 
> 
> 
> _______________________________________________
> developer mailing list
> [email protected]
> http://lists.open-zfs.org/mailman/listinfo/developer
> 


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to