Hi guys,

I've been working with Kiran for 3 weeks trying to fix DIRSERVER-1377. I think we have eliminated many differentt erros, and added some needed synchronization code.

Right now, I can say that I was able to run the server using a multi-threaded client creating and deleting the same entry (on enetry per thread) without problem (I have done 2 000 000 updates, with 20 threads).

But the oneLevel index is broken soon after the test has started, and I can reproduce this behavior quite easily, after only a few thousands of loops.

So i'm still digging, and now, my perception is that we have a problem in the current AVL tree implementation. I'm not very proud to admit that I have spent more than one week trying to implement my own version of those tree, using a damn smart algorithm, all iterative, with a hell of optimisation and "pointers" all over the nodes. What a *fool* !

Not only it was useless, but the code I ended with, more than 4500 lines of code, is not less likely to be bug free than the existing code. NIH syndrom, I guess.


Anyway, yesterday, I had a kind of illumination in the dark (and smelly code I'm gently floating in ...). Do we need to use AVLTrees ?

Let's see the pros and cons.

pros :
o From the academic POV, a very interesting data structure. A cool one to evaluate your students, as soon as they don't ask you to write the code o From the theorical POV : search in 1.44 O( log( N ) ), add and delete in O ( Log( N ) ) too : quite efficient
o From the ASF POV : it could be promoted to commons-collection

cons :
o A hell to implement correctly
o A hell to test thoroughly
o 4500 line sof code : a hell to maintain
o Costly memory consumption (4 references to other nodes : left and right, next and previou, a balance flag, a reference to the value)
o Complicated addition, even more complicated deletion
o Complicated serialization
o Almost impossible to enter in the code without a solid understanding of the algorithm, a few existing decent web references.

So should we continue to use them ? Is there an alternative ?


No. And *Yes*.

And a damn good alternative : a simple array of sorted elements.

Why do we immediatly think about complex data structure when simple one are offering all what we need ?

Ok, what do we need ? A place to store a set of ordered values. Thiose values can be almost any kind of objects, and we may have millions of them. Let's consider the case we just store 512 objects max, as we will use another data structure on disk when we are above thise number.

A Object[] is all what we need, assuming it's sorted. And sorting such an array is easy, as we just have to find the place where we will insert the new value, copy some part of the array to the right, and insert the data. O(Log(N)) for the search, plus a copy of all the values above the given one. Removing an element : same thing : find it, move all the elements on the left one slot, and we are done Searching : O(log(N)), as the array is sorted (use a dichotomic alogorithm. Quite easy)

Memory ? We just store an array of references to Objects, plus some extra size. Let's say we start with a 16 slots array. Not to much space is wasted. If we compare that with an AvlNode, when having more than 3 nodes, we save memory using an array even slightly too big.

Serialization is *way* easier, so is deserialization.

The whole code should be no more than a few hundred of commented and javadoced lines of code.

Last, not least, it's likely to be *faster* than the AvlTree for the basic search operation.

I'm almost done with this new implementation, just wanted to inform you about those ideas. And I also think that it will fix DIRSERVER-1377.

Last, not least, I will be MIA for 5 days - unless I find some cheap internet connection).

Thanks !

PS : Any thoughts ?

--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Reply via email to