A few things: 1. It is 150% of the speed, not 150% faster. That means that it is only 50% faster. Also, this is an average. Real world decompression speed gains seem to be in the range of 35% to 55% faster if you look at each of the files in the silesia corpus. 50% is the rough average, but only for -O3. -O2 is slightly slower.
2. The table is garbled. Here are the results from my Intel Core 2 Quad
9550 on an uncompressed tar archive of the Silesia corpus:
$ ./fullbench -d /tmp/silesia.tar
*** LZ4/LZJB speed analyzer 64-bits, by Yann Collet (with LZJB hacks by
Strontium) (Oct 24 2013) ***
/tmp/silesia.tar :
LZ4_decompress_fast : 211957760 -> 1016.6 MB/s
LZ4_decompress_fast_withPr : 211957760 -> 1017.4 MB/s
LZ4_decompress_safe : 211957760 -> 967.8 MB/s
LZ4_decompress_safe_withPr : 211957760 -> 967.1 MB/s
LZ4_decompress_safe_partia : 211957760 -> 965.6 MB/s
ZFS lzjb_decompress : 211957760 -> 339.3 MB/s
BSD lzjb_decompress : 211957760 -> 307.4 MB/s
HAX lzjb_decompress : 211957760 -> 501.5 MB/s
** TOTAL ** :
LZ4_decompress_fast : 211957760 -> 1016.6 MB/s
LZ4_decompress_fast_w : 211957760 -> 1017.4 MB/s
LZ4_decompress_safe : 211957760 -> 967.8 MB/s
LZ4_decompress_safe_w : 211957760 -> 967.1 MB/s
LZ4_decompress_safe_p : 211957760 -> 965.6 MB/s
ZFS lzjb_decompress : 211957760 -> 339.3 MB/s
BSD lzjb_decompress : 211957760 -> 307.4 MB/s
HAX lzjb_decompress : 211957760 -> 501.5 MB/s
That is at -O3. Here is -O2:
$ ./fullbenchO2 -d /tmp/silesia.tar
*** LZ4/LZJB speed analyzer 64-bits, by Yann Collet (with LZJB hacks by
Strontium) (Oct 24 2013) ***
/tmp/silesia.tar :
LZ4_decompress_fast : 211957760 -> 1015.0 MB/s
LZ4_decompress_fast_withPr : 211957760 -> 1016.6 MB/s
LZ4_decompress_safe : 211957760 -> 972.7 MB/s
LZ4_decompress_safe_withPr : 211957760 -> 995.1 MB/s
LZ4_decompress_safe_partia : 211957760 -> 972.3 MB/s
ZFS lzjb_decompress : 211957760 -> 340.0 MB/s
BSD lzjb_decompress : 211957760 -> 311.8 MB/s
HAX lzjb_decompress : 211957760 -> 478.6 MB/s
** TOTAL ** :
LZ4_decompress_fast : 211957760 -> 1015.0 MB/s
LZ4_decompress_fast_w : 211957760 -> 1016.6 MB/s
LZ4_decompress_safe : 211957760 -> 972.7 MB/s
LZ4_decompress_safe_w : 211957760 -> 995.1 MB/s
LZ4_decompress_safe_p : 211957760 -> 972.3 MB/s
ZFS lzjb_decompress : 211957760 -> 340.0 MB/s
BSD lzjb_decompress : 211957760 -> 311.8 MB/s
HAX lzjb_decompress : 211957760 -> 478.6 MB/s
Interestingly, the other decompressors are faster at -O2 than at -O3
while Steven's is faster at -O3 than at -O2. It might be possible to
obtain the -O3 performance at -O2 by prefixing the function with
something like:
#ifdef __GNUC__
__attribute__((optimize("unroll-loops")))
#endif
On 10/24/2013 11:15 AM, Strontium wrote:
> Hi all,
>
> After a conversation on IRC with Ryao about lzjb performance and the
> proposed BSD version LZJB decompressor. I decided to modify the lz4
> benchmark code and wedge in lzjb from ZFS to compare them.
>
> I have published code and the result here:
> https://github.com/stevenj/lzjbbench
>
> In the process i hacked up an experimental lzjb decompression
> implementation. It is not based on the existing code, its from scratch
> decoding of the bit stream.
>
> In the results my decoder is identified as "HAX_lzjb_decompress"
>
> Sample results:
> *ALGORITHM**FILE NAME**FILE SIZE**COMPRESSED SIZE**BLOCK SIZE**MB/s**DIFF*
> HAX_lzjb_decompressenwik8100000000687210361048576443.8133.71%
> ZFS_lzjb_decompressenwik8100000000786363371024331.9
> HAX_lzjb_decompresssilesia.zip681827447652923510242635579.50%
> ZFS_lzjb_decompresssilesia.zip68182744764865714194304454.7
> HAX_lzjb_decompressmozilla51220480298534044096616.9150.68%
> ZFS_lzjb_decompressmozilla51220480288685914194304409.4
> HAX_lzjb_decompresswebster41458703265665964096466.6138.37%
> ZFS_lzjb_decompresswebster41458703301354651024337.2
> HAX_lzjb_decompressenwik8.zip364454754098524010485762792.3614.64%
> ZFS_lzjb_decompressenwik8.zip364454754098548965536454.3
> HAX_lzjb_decompressnci33553445110884971024736.7120.91%BSD_lzjb_decompressnci
> 3355344587148924194304609.3
>
> Each of these is my algorithms WORST result vs the alternatives BEST.
> This is built with -O3 and run on a AMD FX 8150 and is pure C.
>
> My github has the full spreadsheet with all the data if anyone is
> interested.
>
> Things i would like to qualify. My algorithm has had no substantial speed
> tweaking, its just a first attempt at a faster method.
> It primarily works by overcopying and using 8 byte transfers wherever
> possible. Basically, the theory is its just as expensive to write one byte
> to memory as it is to write 8 (at least on a 64bit machine), so i write 8
> and then adjust the pointers (which are cheap register operations). But it
> also picks up some easy to optimize corner cases as well, which is why it
> performs so well on decompressing un-compressable data. I know there is
> room for improvement still.
> Its hacky and i haven't cleaned it, its a single days coding, so i am sure
> it can be a lot nicer.
>
> The LZ4 test suite is good, it try's to, as much as it can, test ONLY the
> speed of decompression or compression and to eliminate IO. This is good,
> because IO is a variable but the efficiency of the algorithm is not. An
> inefficient algorithm may look much better than it really is if slow IO is
> allowed to cloud the result.
>
> I adapted the benchmark code to make it more useful for me when testing new
> algorithms.
>
> I also tested the new changes to lzjb decompression BSD made. Except in
> very few cases, in this test, classical lzjb beats it. nci above is one
> case where the BSD one beats it. My experimental decoder beats them both
> by a long margin.
>
> I also believe LZJB compression should be able to be made significantly
> faster. Experiments in that regard are on my"todo" list.
>
> Ideally when this is clean i would propose it or an improved successor as a
> replacement or supplement to the existing implementation of lzjb
> decompression.
>
> Steven (Strontium)
>
>
>
>
> _______________________________________________
> developer mailing list
> [email protected]
> http://lists.open-zfs.org/mailman/listinfo/developer
>
signature.asc
Description: OpenPGP digital signature
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
