GCC can generate code using MMX, SSE and/or SSE2 instructions on x86_64.
That could explain the discrepancy between the benchmark results for the
original ZFS lzjb implementation and the BSD version and claims by the
FreeBSD developers. To my knowledge, those instructions are not
permissible on any current Open ZFS platform. That could mean that the
benchmark numbers for Strontium/Justin's version could be lower than
what the benchmark numbers suggest.

I tried invoking `make all CFLAGS='-mno-mmx -mno-sse -mno-sse2'` to see
what the difference is, but doing that triggered the following GCC bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55185

My curiosity has led me to start a GCC 4.4.7 build under the assumption
that this is a recent regression in GCC so that I could try again with that.

With that said, I like the approach Steven took to benchmarking various
implementations and I think it is a step in the right direction. It
should become ideal when it can be done with MMX, SSE and SSE2
instructions disabled.

On 10/24/2013 11:15 AM, Strontium wrote:
> Hi all,
> 
> After a conversation on IRC with Ryao about lzjb performance and the 
> proposed BSD version  LZJB decompressor.  I decided to modify the lz4 
> benchmark code and wedge in lzjb from ZFS to compare them.
> 
> I have published code and the result here: 
> https://github.com/stevenj/lzjbbench
> 
> In the process i hacked up an experimental lzjb decompression 
> implementation.  It is not based on the existing code, its from scratch 
> decoding of the bit stream.
> 
> In the results my decoder is identified as "HAX_lzjb_decompress"
> 
> Sample results:
> *ALGORITHM**FILE NAME**FILE SIZE**COMPRESSED SIZE**BLOCK SIZE**MB/s**DIFF*
> HAX_lzjb_decompressenwik8100000000687210361048576443.8133.71%
> ZFS_lzjb_decompressenwik8100000000786363371024331.9
> HAX_lzjb_decompresssilesia.zip681827447652923510242635579.50%
> ZFS_lzjb_decompresssilesia.zip68182744764865714194304454.7
> HAX_lzjb_decompressmozilla51220480298534044096616.9150.68%
> ZFS_lzjb_decompressmozilla51220480288685914194304409.4
> HAX_lzjb_decompresswebster41458703265665964096466.6138.37%
> ZFS_lzjb_decompresswebster41458703301354651024337.2
> HAX_lzjb_decompressenwik8.zip364454754098524010485762792.3614.64%
> ZFS_lzjb_decompressenwik8.zip364454754098548965536454.3
> HAX_lzjb_decompressnci33553445110884971024736.7120.91%BSD_lzjb_decompressnci
> 3355344587148924194304609.3
> 
> Each of these is my algorithms WORST result vs the alternatives BEST.
> This is built with -O3 and run on a AMD FX 8150 and is pure C.
> 
> My github has the full spreadsheet with all the data if anyone is 
> interested.
> 
> Things i would like to qualify.  My algorithm has had no substantial speed 
> tweaking, its just a first attempt at a faster method.
> It primarily works by overcopying and using 8 byte transfers wherever 
> possible.  Basically, the theory is its just as expensive to write one byte 
> to memory as it is to write 8 (at least on a 64bit machine), so i write 8 
> and then adjust the pointers (which are cheap register operations).  But it 
> also picks up some easy to optimize corner cases as well, which is why it 
> performs so well on decompressing un-compressable data. I know there is 
> room for improvement still.
> Its hacky and i haven't cleaned it, its a single days coding, so i am sure 
> it can be a lot nicer.  
> 
> The LZ4 test suite is good, it try's to, as much as it can, test ONLY the 
> speed of decompression or compression and to eliminate IO.  This is good, 
> because IO is a variable but the efficiency of the algorithm is not.  An 
> inefficient algorithm may look much better than it really is if slow IO is 
> allowed to cloud the result.
> 
> I adapted the benchmark code to make it more useful for me when testing new 
> algorithms.
> 
> I also tested the new changes to lzjb decompression BSD made.  Except in 
> very few cases, in this test, classical lzjb beats it.  nci above is one 
> case where the BSD one beats it.   My experimental decoder beats them both 
> by a long margin.
> 
> I also believe LZJB compression should be able to be made significantly 
> faster.  Experiments in that regard are on my"todo" list.
> 
> Ideally when this is clean i would propose it or an improved successor as a 
> replacement or supplement to the existing implementation of lzjb 
> decompression.
> 
> Steven (Strontium)
> 
> 
> 
> 
> _______________________________________________
> developer mailing list
> [email protected]
> http://lists.open-zfs.org/mailman/listinfo/developer
> 


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to