Thank Matthew, I am super happy to kill the 32 bit code stuff. It makes the code harder to read, test and write. It is bad enough I need to account for endian-ness, so unless there are huge problems from anyone else i will remove that variant. I do not have access to a big endian machine or VM, so if anyone with access to such is willing to work with me I would like to have the test run on that so i can correct any implementation bugs.
I will try and get the formatting inline with ZFS style on my next update. Regarding speeds. and testing. [bellow, 100% = same speed, not twice the speed] As between LZJB Fast current and my original decoder I am seeing a general speed improvement of between 104.5% and 208%, depending on the data being compressed. Compared to Stock LZJB i am seeing speeds of between 143% and 1955%. Yes the 1955% is correct. The New LZJB Decompressor is extremely good at handling uncompressable data. I have 2 files however in my test set which concern me and so i need to work out what is happening with those. E.coli http://corpus.canterbury.ac.nz/descriptions/large/E.coli.html My original decompressor is 131% the speed of my current decompressor. This would not be so concerning as there are trade offs in performance tuning, as the new decompressor is still 144% the speed of Stock LZJB. However. kennedy.xls http://corpus.canterbury.ac.nz/descriptions/cantrbry/Excl.html my old decompressor is 161% the speed of my current one. AND the stock decompressor is 114% the speed of my current one. I suspect (hope) the slow downs in these two files are related. For kennedy.xls my decompressor is the slowest, by a long way. I need to work out why and try and correct that. I am using the standard "corpus" used for compression testing to try and give myself a spread of sample "real world" data and if kennedy.xls is producing such bad results, real world data could be expected to produce similarly bad results if it follows similar patterns. The strange thing is I am not aware of any re-factoring to the code which could account for this result. There are a couple of other slow downs in the test data i use, but all of them are artificial sequences so im not too concerned about those, and in all those cases the new decompressor is still faster than stock. BUT kennedy.xls is a big concern for me. So until i work this one out i am not confident of the algorithm. For test data I am using the Silesia Corpus, and all the Corpus from http://corpus.canterbury.ac.nz/descriptions/, and enwik8 for test data (Including their archive files). If anyone has suggestions for another set of representative data i am happy to add that to my test set. Steven. On Sun, Oct 27, 2013 at 3:36 AM, Matthew Ahrens <[email protected]> wrote: > Steven, this work to speed up LZJB is great. I look forward to seeing it > in illumos & all platforms. I just have a few general comments: > > Personally, I don't really care about performance on 32-bit platforms. So > I'd prefer to simplify the code by just having the 64-bit optimized > version, and letting the compiler do its best with uint64_t's on 32-bit. > But I'm not sure if anyone else cares about this level of performance on > 32-bit, though. > > The code should be formatted like the rest of the ZFS code. The "cstyle" > program on illumos can check some of this. The main thing I noticed is the > indentation, which should be one tab per level (displayed as 8 spaces). > > --matt > > > On Sat, Oct 26, 2013 at 11:49 AM, Strontium <[email protected]> wrote: > >> Update: >> I have updated my source at https://github.com/stevenj/lzjbbench >> >> I have cleaned up the source of the new method considerably. I now no >> longer think of it as "hacky". >> >> It lives in its own file: >> https://github.com/stevenj/lzjbbench/blob/master/lzjb_fast.c >> >> The latest version seems at least 20% faster than the previous version. >> It is Pure C, with -O2 optimization, BUT NO MMX, and NO SSE. >> >> The only instruction "tweak" i use is the GCC builtin "__builtin_ctz" >> which improves performance by a couple of percentage points. But there is >> a pure C fallback which is still very fast should it be unavailable at >> compile time. >> >> It "Should" work on Big Endian and 32 bit architectures. BUT I have not >> tested it on these. >> >> The code has been written with the intention of it integrating easily >> into ZFS, and so, it includes ZFS headers and uses ZFS types. As far as I >> can tell it should plug straight in with very little effort. >> >> On 64 Bit little endian, it has been run through extensive data tests >> using the compression corpuses: Calgary, Canterbury, Silesia and enwik8. >> It can successfully decode lzjb compressed versions of all these files. I >> am not aware of any data decoding issues. I also have produced customized >> test data to exercise the RLE Optimization as it has many possible code >> paths, each path has been fully covered by testing. >> >> The LZJB bitstream does not lend itself to high speed extraction, I >> believe without resorting to assembler and extended instruction sets there >> is little further room for improvement. An improved LZJB bitstream >> generator would in theory allow decompression to speed up, however at the >> moment the only LZJB compressor is the stock one from ZFS. >> >> I am currently running an exhaustive test run. Full results will follow >> in a follow up post tomorrow. >> >> Subject to testing on 32bit and big endian architectures, or any >> unexpected results from the current test run, I believe this improved >> compressor is complete and is ready for wider testing. >> >> Steven Johnson >> >> To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> > > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
