That sounds fantastic! Thanks! In your stress test, try doing random insertions/deletions to try to find things that break the red/black invariant.
Robby On Tue, Nov 13, 2012 at 5:21 PM, Danny Yoo <[email protected]> wrote: > Hi Robby, > > I've been looking at a profile of DrRacket when opening/indenting large > files. Using the statistical profiler, I've measured a large chunk of time > (around 40%) is dedicated to the search operation of the existing > syntax-color token-tree. That seems too high, so I've been trying to > understand what's going on there. > > I suspect that the access pattern that's applied on the tree during > indentation resembles the worst-case scenario of splay trees: if we search > each node in sequential order, the structure of a splay tree will linearize, > ruining the dynamic balance. Indentation uses a repeated, sequential search > across the tokens. > > Concretely, if I add the following diagnostic to the token-tree% class: > > -------------------------------------------------- > + (define (height node) > + (cond > + ((not node) 0) > + (else > + (add1 (max (height (node-left node)) > + (height (node-right node))))))) > + > ;; Moves the node at key-position to the root > (define/public (search! key-position) > (when root > - (set! root (internal-search root key-position)))) > + (set! root (internal-search root key-position)) > + (printf "debug: size: ~a, height: ~a\n" (size root 0) (height > root))) > + > -------------------------------------------------- > > and open up a large file like collects/drracket/private/unit.rkt, I can see > that the tree is much deeper than it deserves to be. During a tabify-all, > the tree does appear to get huge; it swings between a height in the hundreds > into the thousands thousands. That's much too tall, considering that there > are only about 40000 tokens in the tree. > > > I believe that a balanced BST should give us better performance. I have > been writing an alternative implementation of the token tree based on > red-black trees. > > https://github.com/dyoo/new-token-tree > > It is unfortunately missing the necessary splitting operations. But I plan > to implement split and concat with the algorithm described in: > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.4875 > > so I don't anticipate fundamental problems. > > > I wanted to surprise you by getting this all working by mid-week, but it's > taking longer than I thought... :) So I might as well run it by you to make > sure the idea is sound before I go further on this track. _________________________ Racket Developers list: http://lists.racket-lang.org/dev

