Hi all, After a conversation on IRC with Ryao about lzjb performance and the proposed BSD version LZJB decompressor. I decided to modify the lz4 benchmark code and wedge in lzjb from ZFS to compare them.
I have published code and the result here: https://github.com/stevenj/lzjbbench In the process i hacked up an experimental lzjb decompression implementation. It is not based on the existing code, its from scratch decoding of the bit stream. In the results my decoder is identified as "HAX_lzjb_decompress" Sample results: *ALGORITHM**FILE NAME**FILE SIZE**COMPRESSED SIZE**BLOCK SIZE**MB/s**DIFF* HAX_lzjb_decompressenwik8100000000687210361048576443.8133.71% ZFS_lzjb_decompressenwik8100000000786363371024331.9 HAX_lzjb_decompresssilesia.zip681827447652923510242635579.50% ZFS_lzjb_decompresssilesia.zip68182744764865714194304454.7 HAX_lzjb_decompressmozilla51220480298534044096616.9150.68% ZFS_lzjb_decompressmozilla51220480288685914194304409.4 HAX_lzjb_decompresswebster41458703265665964096466.6138.37% ZFS_lzjb_decompresswebster41458703301354651024337.2 HAX_lzjb_decompressenwik8.zip364454754098524010485762792.3614.64% ZFS_lzjb_decompressenwik8.zip364454754098548965536454.3 HAX_lzjb_decompressnci33553445110884971024736.7120.91%BSD_lzjb_decompressnci 3355344587148924194304609.3 Each of these is my algorithms WORST result vs the alternatives BEST. This is built with -O3 and run on a AMD FX 8150 and is pure C. My github has the full spreadsheet with all the data if anyone is interested. Things i would like to qualify. My algorithm has had no substantial speed tweaking, its just a first attempt at a faster method. It primarily works by overcopying and using 8 byte transfers wherever possible. Basically, the theory is its just as expensive to write one byte to memory as it is to write 8 (at least on a 64bit machine), so i write 8 and then adjust the pointers (which are cheap register operations). But it also picks up some easy to optimize corner cases as well, which is why it performs so well on decompressing un-compressable data. I know there is room for improvement still. Its hacky and i haven't cleaned it, its a single days coding, so i am sure it can be a lot nicer. The LZ4 test suite is good, it try's to, as much as it can, test ONLY the speed of decompression or compression and to eliminate IO. This is good, because IO is a variable but the efficiency of the algorithm is not. An inefficient algorithm may look much better than it really is if slow IO is allowed to cloud the result. I adapted the benchmark code to make it more useful for me when testing new algorithms. I also tested the new changes to lzjb decompression BSD made. Except in very few cases, in this test, classical lzjb beats it. nci above is one case where the BSD one beats it. My experimental decoder beats them both by a long margin. I also believe LZJB compression should be able to be made significantly faster. Experiments in that regard are on my"todo" list. Ideally when this is clean i would propose it or an improved successor as a replacement or supplement to the existing implementation of lzjb decompression. Steven (Strontium)
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
