Re: Question on LZF block identifiers

Sam Van Oort Tue, 15 Dec 2009 07:46:52 -0800

On Dec 10, 6:26 pm, Tatu Saloranta <[email protected]> wrote:
> On Thu, Dec 10, 2009 at 1:34 PM, Sam Van Oort <[email protected]> wrote:
>
> > Hi,
> > I can add this to a list of extensions to theLZFcode.  It's not a
> > bad idea to have a version which is fully binary compatible, which can
> > be used as a compatibility option in the future.
>
> I think it'd be nice to be compatible, especially if a separate
> library was carved out.
I'm talking with Mr. Mueller about this, but it could take some time
to work out.  We'll see where it goes?


> Also: using chunk identifiers like command-line tools has one nice
> benefit; that is, you can just sequence blocks without restrictions as
> there is no initial header (which can be downside in some cases too).
Could you explain a little more?  I haven't looked at the C LZF code,
to avoid possible legal problems.   I can do that now that my Java LZF
extensions appear complete.

> What kind of improvements are there? Better hashing? I assume changes
> to format wouldn't be needed?
No changes to format... just a system that stores more hashes and
checks for the best of several candidate back-references.   Also a
variant that hashes *all* bytes rather than just the literals and last
couple from each back-reference, but that version is still too slow to
be useful.

As it stands now, it looks like these won't make it into the H2
codebase, because Mr. Mueller wants to keep the code pared down, but
here's a teaser in case anyone is interested in pre-release versions.

Benchmarks (Intel Core 2 Duo T5270 @ 1.4GHz, single core only):

File -- Parser.java  (all speeds in MB/s)
Compressor:     CompressRate:    Compression Ratio:     Expand Rate
(old expander):        Expand Rate (new expander):
FASTEST                 97.5
0.2704
488.3                                                 556.1
FAST                       70.1
0.2526
--                                                        599.5
NORMAL                  48.5
0.2413
--                                                        669.3
BETTER                   25.2
0.2324
--                                                        693.8

"Fastest" corresponds to the current (optimized) compressor.  "Fast"
is equivalent in speed (more or less) to the older un-optimized
version.
Note that the difference b/w ratios of 0.27 and  0.232 is about 15%
decrease in file size, and the expansion rate grows with compression.
Yes, the compression rate drops significantly, but for frequent read
but infrequent write, it's worth it.

Expect speeds a little over double on a more modern system (say 2.6
GHz Core 2 Duo), so the slowest option still compresses at 50 MB/s or
so.

> Yes, many other projects have expressed interest in using a pluggable codec.
Something in the same style as the existing Inflater/Deflater and GZip/
Zip Input/Output streams?  I think this I can do that easily.
Doesn't surprise me to see interest, since there's so little support
for LZF in the Java libraries.

Cheers,
Sam Van Oort

--

You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: Question on LZF block identifiers

Reply via email to