TreeMap Algorithmics

Morgan Schweers Wed, 16 Feb 2000 15:27:20 -0800
Greetings,
    I want to proffer an algorithm that I came up with (but may not be
original!), that I'm not completely sure of.  I'm not sure of it because
of two things, one is a general unease about how good my own skills at
creating new algorithms are, and the second is that it seems 'obvious'
enough that having not seen it *anywhere* else makes me worried.

    I put the subject as TreeMap, because it's where I'd very much like
to see this used...  The concept is as follows:

        For each node in a balanced binary (could be used for n-way trees,
easily) tree, store a counter (int?) which identifies the number of
nodes beneath it.  This is an easy number to keep track of, just
modifying it as you make additions to the tree.  When you do a rotation
to balance the tree, you have to update the values you're rotating to
the new values, but they should be easy to calculate.

        Yes, this means you touch (RMW) all the values in the path to the new
node, which can be a performance hit during initial random creation of a
large tree.  Having not mentally worked it out in detail, I'm guessing
it probably makes adding and deleting a new node about 2log2(n)?  It
does not affect normal (keyed) lookup time at all, however.

        The value of storing this additional information is that you can now,
with log2(n) speed (if I'm remembering my algorithmics right) locate any
arbitrary INDEXED reference in the tree.

        In other words, say you have a tree of 10,000 keys.  In order to find
out which name is the 5,000th, you would normally have to do a
sequential search from the leftmost node (presuming leftmost is the
first in order).  This is, obviously, an abysmal idea.  With this
indexing scheme, you can reference the 5000th, and find it by simply
doing a binary walk through the count of indexes beneath the left and
right nodes of the root.  At the same time you can look up any arbitrary
value, the way you normally would.

        This is ESPECIALLY relevant in Java, with Swing's data model
interface.  If you can display a list of 10,000 names in alphabetical
order, and be able to search to the 5000th with minimal overhead, then
it becomes possible to display this to the user in a very clean fashion,
with VERY good response time for nearly arbitrarily sized datasets.

        Most of the other ways of doing this seem to require more storage, or
more time, or a static dataset in order to work cleanly.  I feel that
this would be a wonderful addition to the project, although it's not
specifically oriented towards Sun Java compatibility.

    Back to Classpath specifics comments, the question may arise: why
can't I simply take the Classpath code and add this myself?  Because at
one point in time, I learned a good deal of what I know about Java from
reading the source files in src.zip, before I realized that I was
'soiling' my mind from being able to work on projects like Classpath.  I
don't recall if I looked at the TreeMap code, but it's quite possible I
did (I know I looked at Vector, for instance), and so I don't wish to be
the introducer of unclean code.

    I hope that someone will take this idea and run with it.  I haven't
seen ANYTHING (and have read all of Comer, et al, looking for it!) that
offers the same characteristics for both indexed and keyed lookup.  If
someone knows of a better algorithm, please enlighten me!

    Since I'm uncertain about being able to donate code, I hope that the
occasional algorithm or implementation thought can be helpful.

                                              --  Morgan Schweers
TreeMap Algorithmics

Reply via email to