Re: src/kwset.c (kwsincr): Replace arbitrary array size by proper value

Charles Levert Fri, 08 Jul 2005 05:33:47 -0700

* On Thursday 2005-07-07 at 16:48:49 -0700, Paul Eggert wrote:
> Charles Levert <[EMAIL PROTECTED]> writes:
> 
> >> The height of a balanced binary tree with N internal nodes always lies
> >> between lg(N + 1) and (1.4404 lg (N + 2) - 0.3277).  Here, the
> >> "internal nodes" are those with children,
> >
> > Unfortunately, in this case I use n for all
> > nodes, internal and leaves.
> 
> If you have N internal nodes, then you have N + 1 leaf nodes, for a
> total of 2*N + 1 nodes, so it shouldn't be hard to substitute your
> variables for Knuth's.  But I would stick with Knuth's notation
> myself; why reinvent the wheel?


It turns out there is an equivalence between
Knuth's internal subtree (the tree that's
created from only the internal nodes, but not
the leaf ones) and what's needed to represented
a minimum-node-count tree for a given height.
Half of those worst case trees have an even
number of nodes, so it's clear that using the
full 2*N + 1 tree would not have been what's
appropriate.

My upper bound on the height is in effect

    h <= floor(1.5 * b) - 1

while Knuth's is

    h <= 1.4404 * log_2(2^b + 2) - 0.3277

I only claim that my bound holds for integral
values of b (i.e., for N = n = 2^b), which is
what we care about for this application, and
for b >= 4.

It can be numerically verified that my bound
is tighter than Knuth's only for b in { 4..11,
13, 15, 17, 19 }.  It happens that it's also
exact for those same values of b.  In practice,
b is CHAR_BIT so that covers most of the values
we care about.

Therefore, my upper bound cannot be deduced
using Knuth's upper bound as a starting point
for those small values of b.  Since this is a
quite limited number of values of b, I remain
satisfied using just a numerical verification
for those (by comparing to the exact values
deduced from the worst case tree of each height).

However, for even b >= 12 and for odd b >= 21,
we can try to see what happens every time we
increase b by 2.

My upper bound increases by 3 every time b
increases by 2, whether b is even or odd.

Let a = log(2) / log(phi) and phi = (1 + sqrt(5)) / 2.
Knuth's bound will increase by

    a * (log_2(2^(b + 2) + 2) - log_2(2^b + 2))

for the same increase from b to b + 2.

So it suffices to show that

    3  >  a * (log_2(2^(b + 2) + 2) - log_2(2^b + 2))

for a given value of b, for my upper bound
to get even looser compared to Knuth's when
that b is increased to b + 2.  By taking the
power of 2 on each side, this inequality can be
transformed into

    8 + phi  >  (2^(b + 2) + 2) / (2^b + 2)

    8 + phi  >  (4 * 2^b + 2) / (2^b + 2)

    (8 + phi) * (2^b + 2)  >  (4 * 2^b + 2)

    (8 + phi) * 2^b + (8 + phi) * 2  >  4 * 2^b + 2

    (8 + phi) * 2^b + (16 + 2 * phi)  >  4 * 2^b + 2

    ((8 + phi) - 4) * 2^b  >  2 - (16 + 2 * phi)

    (4 + phi) * 2^b  >  -14 - 2 * phi

    positive > negative

Therefore, Knuth's bound is tighter than mine
for even b = 12 and only gets tighter each
time this even b is increased by 2.  Likewise,
Knuth's bound is tighter than mine for odd
b = 21 and only gets tighter each time this odd
b is increased by 2.  A candidate bound that is
looser than a known valid bound is also valid.


> > This is ok, though.  The floor(1.5 * b) formula
> > is still the best and simplest one we can use,
> 
> I disagree!  You don't have a proof.  Knuth does.  You should be able
> to derive your lower bound from Knuth's; if you can't, then there's
> something really fishy going on.

I didn't have a generic proof for any value of b,
no matter how large it was.

I did however have a numerical verification
for every single value of b from 4 to 1023,
which I believe to be correct.  In practice,
b is CHAR_BIT and is unlikely to be more than
the machine's word size, way less than 1023.

This may not have been as elegant or complete
(as it didn't cover arbitrarily large values of
b we don't care about), but it was never fishy.

It would not have mattered to this application if
my upper bound happened to break for some value
of b beyond 1023, because we'll never have a tree
this huge.  It still would have been safe to use.

Re: src/kwset.c (kwsincr): Replace arbitrary array size by proper value

Reply via email to