* On Thursday 2005-07-07 at 16:48:49 -0700, Paul Eggert wrote: > Charles Levert <[EMAIL PROTECTED]> writes: > > >> The height of a balanced binary tree with N internal nodes always lies > >> between lg(N + 1) and (1.4404 lg (N + 2) - 0.3277). Here, the > >> "internal nodes" are those with children, > > > > Unfortunately, in this case I use n for all > > nodes, internal and leaves. > > If you have N internal nodes, then you have N + 1 leaf nodes, for a > total of 2*N + 1 nodes, so it shouldn't be hard to substitute your > variables for Knuth's. But I would stick with Knuth's notation > myself; why reinvent the wheel?
It turns out there is an equivalence between Knuth's internal subtree (the tree that's created from only the internal nodes, but not the leaf ones) and what's needed to represented a minimum-node-count tree for a given height. Half of those worst case trees have an even number of nodes, so it's clear that using the full 2*N + 1 tree would not have been what's appropriate. My upper bound on the height is in effect h <= floor(1.5 * b) - 1 while Knuth's is h <= 1.4404 * log_2(2^b + 2) - 0.3277 I only claim that my bound holds for integral values of b (i.e., for N = n = 2^b), which is what we care about for this application, and for b >= 4. It can be numerically verified that my bound is tighter than Knuth's only for b in { 4..11, 13, 15, 17, 19 }. It happens that it's also exact for those same values of b. In practice, b is CHAR_BIT so that covers most of the values we care about. Therefore, my upper bound cannot be deduced using Knuth's upper bound as a starting point for those small values of b. Since this is a quite limited number of values of b, I remain satisfied using just a numerical verification for those (by comparing to the exact values deduced from the worst case tree of each height). However, for even b >= 12 and for odd b >= 21, we can try to see what happens every time we increase b by 2. My upper bound increases by 3 every time b increases by 2, whether b is even or odd. Let a = log(2) / log(phi) and phi = (1 + sqrt(5)) / 2. Knuth's bound will increase by a * (log_2(2^(b + 2) + 2) - log_2(2^b + 2)) for the same increase from b to b + 2. So it suffices to show that 3 > a * (log_2(2^(b + 2) + 2) - log_2(2^b + 2)) for a given value of b, for my upper bound to get even looser compared to Knuth's when that b is increased to b + 2. By taking the power of 2 on each side, this inequality can be transformed into 8 + phi > (2^(b + 2) + 2) / (2^b + 2) 8 + phi > (4 * 2^b + 2) / (2^b + 2) (8 + phi) * (2^b + 2) > (4 * 2^b + 2) (8 + phi) * 2^b + (8 + phi) * 2 > 4 * 2^b + 2 (8 + phi) * 2^b + (16 + 2 * phi) > 4 * 2^b + 2 ((8 + phi) - 4) * 2^b > 2 - (16 + 2 * phi) (4 + phi) * 2^b > -14 - 2 * phi positive > negative Therefore, Knuth's bound is tighter than mine for even b = 12 and only gets tighter each time this even b is increased by 2. Likewise, Knuth's bound is tighter than mine for odd b = 21 and only gets tighter each time this odd b is increased by 2. A candidate bound that is looser than a known valid bound is also valid. > > This is ok, though. The floor(1.5 * b) formula > > is still the best and simplest one we can use, > > I disagree! You don't have a proof. Knuth does. You should be able > to derive your lower bound from Knuth's; if you can't, then there's > something really fishy going on. I didn't have a generic proof for any value of b, no matter how large it was. I did however have a numerical verification for every single value of b from 4 to 1023, which I believe to be correct. In practice, b is CHAR_BIT and is unlikely to be more than the machine's word size, way less than 1023. This may not have been as elegant or complete (as it didn't cover arbitrarily large values of b we don't care about), but it was never fishy. It would not have mattered to this application if my upper bound happened to break for some value of b beyond 1023, because we'll never have a tree this huge. It still would have been safe to use.