Re: [OpenZFS Developer] [zfs-devel] Re: Experimental LZJB decompressor ~150% faster than stock AND BSD Compressor seems slower than stock!

Sandy Johnson Sun, 27 Oct 2013 15:46:29 -0700

Thank Matthew,

I am super happy to kill the 32 bit code stuff.  It makes the code harder
to read, test and write.  It is bad enough I need to account for
endian-ness, so unless there are huge problems from anyone else i will
remove that variant.  I do not have access to a big endian machine or VM,
so if anyone with access to such is willing to work with me I would like to
have the test run on that so i can correct any implementation bugs.

I will try and get the formatting inline with ZFS style on my next update.

Regarding speeds.  and testing. [bellow, 100% = same speed, not twice the
speed]
As between LZJB Fast current and my original decoder I am seeing a general
speed improvement of between 104.5% and 208%, depending on the data being
compressed.
Compared to Stock LZJB i am seeing speeds of  between 143% and 1955%.
Yes the 1955% is correct.  The New LZJB Decompressor is extremely good at
handling uncompressable data.

I have 2 files however in my test set which concern me and so i need to
work out what is happening with those.
E.coli http://corpus.canterbury.ac.nz/descriptions/large/E.coli.html
My original decompressor is 131% the speed of my current decompressor.
 This would not be so concerning as there are trade offs in performance
tuning, as the new decompressor is still 144% the speed of Stock LZJB.

However. kennedy.xls
http://corpus.canterbury.ac.nz/descriptions/cantrbry/Excl.html
my old decompressor is 161% the speed of my current one. AND the stock
decompressor is 114% the speed of my current one.
I suspect (hope) the slow downs in these two files are related.

For kennedy.xls my decompressor is the slowest, by a long way.  I need to
work out why and try and correct that.  I am using the standard "corpus"
used for compression testing to try and give myself a spread of sample
"real world" data and if kennedy.xls is producing such bad results, real
world data could be expected to produce similarly bad results if it follows
similar patterns.  The strange thing is I am not aware of any re-factoring
to the code which could account for this result.

There are a couple of other slow downs in the test data i use, but all of
them are artificial sequences so im not too concerned about those, and in
all those cases the new decompressor is still faster than stock.  BUT
kennedy.xls is a big concern for me. So until i work this one out i am not
confident of the algorithm.

For test data I am using the Silesia Corpus, and all the Corpus from
http://corpus.canterbury.ac.nz/descriptions/, and enwik8 for test data
(Including their archive files).  If anyone has suggestions for another set
of representative data i am happy to add that to my test set.

Steven.

On Sun, Oct 27, 2013 at 3:36 AM, Matthew Ahrens <[email protected]> wrote:

> Steven, this work to speed up LZJB is great.  I look forward to seeing it
> in illumos & all platforms.  I just have a few general comments:
>
> Personally, I don't really care about performance on 32-bit platforms.  So
> I'd prefer to simplify the code by just having the 64-bit optimized
> version, and letting the compiler do its best with uint64_t's on 32-bit.
>  But I'm not sure if anyone else cares about this level of performance on
> 32-bit, though.
>
> The code should be formatted like the rest of the ZFS code.  The "cstyle"
> program on illumos can check some of this. The main thing I noticed is the
> indentation, which should be one tab per level (displayed as 8 spaces).
>
> --matt
>
>
> On Sat, Oct 26, 2013 at 11:49 AM, Strontium <[email protected]> wrote:
>
>> Update:
>> I have updated my source at https://github.com/stevenj/lzjbbench
>>
>> I have cleaned up the source of the new method considerably.  I now no
>> longer think of it as "hacky".
>>
>> It lives in its own file:
>> https://github.com/stevenj/lzjbbench/blob/master/lzjb_fast.c
>>
>> The latest version seems at least 20% faster than the previous version.
>>  It is Pure C, with -O2 optimization, BUT NO MMX, and NO SSE.
>>
>> The only instruction "tweak" i use is the GCC builtin "__builtin_ctz"
>> which improves performance by a couple of percentage points.  But there is
>> a pure C fallback which is still very fast should it be unavailable at
>> compile time.
>>
>> It "Should" work on Big Endian and 32 bit architectures.  BUT I have not
>> tested it on these.
>>
>> The code has been written with the intention of it integrating easily
>> into ZFS, and so, it includes ZFS headers and uses ZFS types.  As far as I
>> can tell it should plug straight in with very little effort.
>>
>> On 64 Bit little endian, it has been run through extensive data tests
>> using the compression corpuses: Calgary, Canterbury, Silesia and enwik8.
>>  It can successfully decode lzjb compressed versions of all these files.  I
>> am not aware of any data decoding issues.  I also have produced customized
>> test data to exercise the RLE Optimization as it has many possible code
>> paths, each path has been fully covered by testing.
>>
>> The LZJB bitstream does not lend itself to high speed extraction, I
>> believe without resorting to assembler and extended instruction sets there
>> is little further room for improvement.  An improved LZJB bitstream
>> generator would in theory allow decompression to speed up, however at the
>> moment the only LZJB compressor is the stock one from ZFS.
>>
>> I am currently running an exhaustive test run.  Full results will follow
>> in a follow up post tomorrow.
>>
>> Subject to testing on 32bit and big endian architectures, or any
>> unexpected results from the current test run, I believe this improved
>> compressor is complete and is ready for wider testing.
>>
>> Steven Johnson
>>
>>  To unsubscribe from this group and stop receiving emails from it, send
>> an email to [email protected].
>>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] [zfs-devel] Re: Experimental LZJB decompressor ~150% faster than stock AND BSD Compressor seems slower than stock!

Reply via email to