Hi all,

After a conversation on IRC with Ryao about lzjb performance and the 
proposed BSD version  LZJB decompressor.  I decided to modify the lz4 
benchmark code and wedge in lzjb from ZFS to compare them.

I have published code and the result here: 
https://github.com/stevenj/lzjbbench

In the process i hacked up an experimental lzjb decompression 
implementation.  It is not based on the existing code, its from scratch 
decoding of the bit stream.

In the results my decoder is identified as "HAX_lzjb_decompress"

Sample results:
*ALGORITHM**FILE NAME**FILE SIZE**COMPRESSED SIZE**BLOCK SIZE**MB/s**DIFF*
HAX_lzjb_decompressenwik8100000000687210361048576443.8133.71%
ZFS_lzjb_decompressenwik8100000000786363371024331.9
HAX_lzjb_decompresssilesia.zip681827447652923510242635579.50%
ZFS_lzjb_decompresssilesia.zip68182744764865714194304454.7
HAX_lzjb_decompressmozilla51220480298534044096616.9150.68%
ZFS_lzjb_decompressmozilla51220480288685914194304409.4
HAX_lzjb_decompresswebster41458703265665964096466.6138.37%
ZFS_lzjb_decompresswebster41458703301354651024337.2
HAX_lzjb_decompressenwik8.zip364454754098524010485762792.3614.64%
ZFS_lzjb_decompressenwik8.zip364454754098548965536454.3
HAX_lzjb_decompressnci33553445110884971024736.7120.91%BSD_lzjb_decompressnci
3355344587148924194304609.3

Each of these is my algorithms WORST result vs the alternatives BEST.
This is built with -O3 and run on a AMD FX 8150 and is pure C.

My github has the full spreadsheet with all the data if anyone is 
interested.

Things i would like to qualify.  My algorithm has had no substantial speed 
tweaking, its just a first attempt at a faster method.
It primarily works by overcopying and using 8 byte transfers wherever 
possible.  Basically, the theory is its just as expensive to write one byte 
to memory as it is to write 8 (at least on a 64bit machine), so i write 8 
and then adjust the pointers (which are cheap register operations).  But it 
also picks up some easy to optimize corner cases as well, which is why it 
performs so well on decompressing un-compressable data. I know there is 
room for improvement still.
Its hacky and i haven't cleaned it, its a single days coding, so i am sure 
it can be a lot nicer.  

The LZ4 test suite is good, it try's to, as much as it can, test ONLY the 
speed of decompression or compression and to eliminate IO.  This is good, 
because IO is a variable but the efficiency of the algorithm is not.  An 
inefficient algorithm may look much better than it really is if slow IO is 
allowed to cloud the result.

I adapted the benchmark code to make it more useful for me when testing new 
algorithms.

I also tested the new changes to lzjb decompression BSD made.  Except in 
very few cases, in this test, classical lzjb beats it.  nci above is one 
case where the BSD one beats it.   My experimental decoder beats them both 
by a long margin.

I also believe LZJB compression should be able to be made significantly 
faster.  Experiments in that regard are on my"todo" list.

Ideally when this is clean i would propose it or an improved successor as a 
replacement or supplement to the existing implementation of lzjb 
decompression.

Steven (Strontium)

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to