I am interested in contributing code for persistent binary search
trees to the GLib project.  This would actually count as credit
towards a grad level data structures class for me, but I have a lot of
experience working with GLib in the past, and developing C-based open
source projects.  First some background on what persistence means and
why it would be nice to have available in an open source data
structures library.

A persistent binary search tree is simply a BST that is versioned over
changes, such that you can do a lookup in any previous version of the
tree.  An efficient implementation of this data structure requires
O(1) additional space per insertion into/deletion from the tree.  That
is, after n insertions, creating n different versions of the tree, the
total space required would be roughly 2*(space for a single version),
while allowing you to search in any version of the tree.  Searching in
any version takes the same number of comparisons and pointer
redirections as a tree with only a single version.

Persistent BSTs are used for operations such as a Plane Sweep [1] in
Computational Geometry.  At the same time, the code behind the
persistence can be generalized to provide an additional data structure
that performs Fractional Cascading [2] for no extra cost to the BST

There are currently no readily available open source implementations
of a persistent BST.  This would be a great tool for students,
academic researchers, and other developers.

I have noticed that GLib has a Binary Search Tree data structure,
which currently uses an AVL tree.  AVL trees can be made persistent,
but the space requirement becomes O(log n) per insert/delete instead
of O(1).

There are 2 other BSTs that would work effectively to provide a
persistent BST, while also maintaining a very efficient implementation
for when persistence is not used.  They are:
 - Treaps
 - Red-Black Trees

Treaps can be expected to outperform Red-Black Trees for any sized
tree.  The constant in the search cost for a Treap is Approximately
equal to AVL trees (and slightly less for large trees), and is always
smaller than Red-Black Trees. (It's a small constant regardless.)
Treaps are also much simpler to implement, requiring less complicated
code. However Treaps are a randomized data structure, which some
people try to stay away from.

My proposed changes to the GTree API are the addition of the following

// returns the latest version of the tree available (starts from 0)
guint    g_tree_current_version             (GTree *tree);

// creates a new version of the tree, changes to the tree will no longer
// affect the previous versions. returns the new current version
guint    g_tree_next_version                (GTree *tree);

For each lookup/traversal function, a new function which takes
arguments (GTree *tree, guint version, ...) would be added, with
existing functions becoming macros pointing to the new ones.

This API allows a version to contain more than a single insert/delete,
if desired, making them more of a logical entity.

So my questions are this:

1) Is there interest in having a persistent binary search tree in the
GLib library ?
2) If so, would there be resistance to using a random data strucuture,
i.e. a Treap to implement them ?

Thanks very much,
Dana Jansens

p.s. I sent this before subscribing, but its been stuck in the
moderator queue for more than a week now, so I'm sending again.
Apologies if you see this twice.

[1] http://en.wikipedia.org/wiki/Sweep_line_algorithm
[2] http://en.wikipedia.org/wiki/Fractional_cascading
gtk-devel-list mailing list

Reply via email to