[htdig3-dev] Re: db & prefix compression [#948]

Sleepycat Software Sat, 24 Jul 1999 07:59:01 -0700

Hi, my name is Keith Bostic and I'm with Sleepycat Software.
Amy Adams assigned your Support-Request to me.

>        I finaly found out what happens with prefix compression.
> When a leaf page of the btree is split because it has no room for
> a new key, an entry is added in the parent page to reference the new
> leaf page. The key of the parent page entry may be reduced by prefix
> compression (assuming the last key of the initial leaf page and the last
> key of the new page have something in common).

Yes, this is correct; from the DB documentation:

        This function is used to compress the keys stored on the btree
        internal pages. The usefulness of this is data dependent, but in
        some data sets can produce significantly reduced tree sizes and
        search times.

>        In short, only keys stored in the internal btree pages are
> compressed. Since the number of internal pages in a btree is really
> small compared to the number of leaf pages, the expected gain is small.

The expected *space* gain is small.  However, wasted space on
an internal Btree page is normally a lot more interesting than
wasted space on a leaf page, as Btree internal pages are
accessed more often than leaf pages, particularly at the higher
levels of the tree.

>        What is more suprising is that more internal pages are used
> when prefix compression is used. I've inserted 5 000 000 keys (50 000
> alphabetical words, each of them inserted with a suffix number going from
> 0 to 99, in ascii). With prefix compression I have 548 internal pages,
> without prefix compression I only have 546 internal pages. 

There is a default prefix compression routine:

        If bt_prefix is not explicitly set, and no key comparison method
        is specified, a default lexical comparison method is used for
        prefix compression. If bt_prefix is not explicitly set and a key
        comparison method is specified, no prefix compression is done. It
        is an error to set bt_prefix without also specifying a key
        comparison method.

If your application-specified prefix compression function does not
perform as well as the default one, this would be the expected outcome.

>        My conclusion is that prefix compression is completely useless
> the way it's implemented.

You're concentrating on space compression and on specific types
of data.  The prefix-compress feature was originally implemented
for genome data sets, where keys are often quite long and often
identical over long prefixes.

> It would save a lot of space if the leaf
> nodes only contained the suffix of the keys. I.e.
>
>       keys: passage, password, passway
>
>       internal page key : pass
>       leaf node keys : word
>                      age
>                      way
>
>       I understand, however, that this requires a lot of modifications
> in the db code and we cannot expect it to work in short delays.

I have concerns about this approach.

To modify the tree (add, delete or change key/data pairs), you
will have to lock both the internal page and the leaf page, which
is going to affect on your levels of concurrency -- you will be
write locking internal pages during modifications, blocking all
threads attempting to access any leaf page referenced by that
internal page.

So, each tree modification you make will potentially lock out
a large chunk of the tree for both read and write access.  You
can minimize the affect by only doing this optimization in the
internal pages immediately above the leaf pages, but I still
think it's going to hurt.

For example, Berkeley DB does not currently update internal pages
during key delete, and you have now required that it do so.

We're trying to run thousands of threads through a single tree,
and a change of this nature is pretty scary.  I've never tried
it, so I can't say for sure, but my bet is that you'll seriously
affect your levels of concurrency.

> Since
> this feature is absolutely critical for htdig, I'm ready to spend all
> the time needed to implement it.

Why is this feature absolutely critical for htdig?

> I'm not familiar with your contribution
> mechanism, however. Can I have access to a CVS tree ? Can I expect a short
> delay feedback when the patch is ready a passes all the regression tests ? 

We're happy to provide snapshots of our source tree, we don't currently
export CVS access, although I could probably be talked into doing that
in September (we use SCCS internally, but will probably be switching to
CVS in late August).

If I were trying to compress the data in the tree, I would try one of
the following two other approaches.

First, it's possible to do both key and data compression on leaf pages
without affecting internal pages at all.  The literature that describes
this calls them "minimum difference btrees", where each key/data pair is
the minimum modification necessary to transform the previous key/data
pair.  The hard parts of this are deciding if this should only affect a
single leaf page or should cross leaf-page boundaries, and/or how to
compute the minimum modification in a reasonably data-set independent
fashion.  I've seen implementations of this, and it requires a fair
amount of code, but it's not an unreasonable approach.

Second, it would be interesting to do Huffman encoding of key/data pairs.
This would be a straight time/space trade-off, and could be done entirely
outside of the internal Btree algorithms.  (You don't want to use LZW or
better encoding algorithms because key/data pairs aren't usually long
enough to make the adaptive algorithms pay off.)

Neither of these approaches reduces the concurrency in the tree (other
than the first approach will require holding page locks longer than you
might otherwise).

I'd be very interested in working with someone to try either of those two
approaches.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic
Sleepycat Software Inc.         [EMAIL PROTECTED]
394 E. Riding Dr.               +1-978-287-4781
Carlisle, MA 01741-1601         http://www.sleepycat.com


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
[htdig3-dev] Re: db & prefix compression [#948]

Reply via email to