Thanks for the reference. Knowing this is called an AVL tree has allowed me to investigate further. Unfortunately, I only own volume 2 of tAoCP. Using lg() instead of log_2() may be a computer science notational thing; I haven't seen it being used when I studied information theory.
* On Tuesday 2005-07-05 at 16:59:09 -0700, Paul Eggert wrote: > Charles Levert <[EMAIL PROTECTED]> writes: > > > -- Balanced binary trees (at every node, > > the depth difference of the two subtrees is > > at most 1). > > > > -- Worst (i.e., maximal) depth d_max(n) of such a > > tree as a function of the number n of nodes > > in it. > > The height of a balanced binary tree with N internal nodes always lies > between lg(N + 1) and (1.4404 lg (N + 2) - 0.3277). Here, the > "internal nodes" are those with children, Unfortunately, in this case I use n for all nodes, internal and leaves. > and we assume each internal > node has exactly 2 children (so that there are N + 1 leaves). At most N + 1 leaves, which complicates things in this case (i.e., finding N from n). > "lg N" > means the log base 2 of N. This is Theorem A of Knuth vol. 3 section > 6.2.3; it is originally due to Adelson-Velsky and Landis (often called > "AVL"). > > > -- Alternatively, worst (i.e., minimal) number > > n_min(d) of nodes in such a tree of depth d. > > > > n_min(0) = 0 > > n_min(1) = 1 > > n_min(d) = d + n_min(0) + ... n_min(n-2) > > Assuming you're counting only internal nodes as above, then n_min(d) > >= F(d+2) - 1, where F(d) is the d'th Fibonacci number. Therefore > n_min(d) is bounded below by ((phi)**(d+2))/(sqrt(5)) - 2, where phi > is the golden ratio (sqrt(5) + 1)/2. This result is is used in the > proof for the AVL result (as published by Knuth). I did some more testing using GNU octave. It turns out the bound only holds when we consider the n = 2^b points (with b an integer), and not all values of n. Otherwise, we would have in effect to round things up to the nearest integral b for it to stay an upper bound at all time. This is ok, though. The floor(1.5 * b) formula is still the best and simplest one we can use, because we always are in the integral b case. I just have to work a little more to document it.