heads up about DIRSERVER-1377/1369

Emmanuel Lecharny Sun, 19 Jul 2009 03:04:39 -0700

Hi,

so I think the issue is now fixed (crossing fingers). I write this mailin order to give some info about what has been done.

First, I would like to thank Kiran who seconded me during the longprocess of trying to get the bug fixed. I would like to thank SumitGoyal and Murali Krishna who found the bug and submitted a test toexhibit the problem. It was incredibly useful.

It took me a month - even if I didn't worked only on this problem, as Ialso had to deal with some client at the same time -, but I can tellthat it was the most complicated problem I have ever had to work on. I'mnot 100% sure what exactly was the problem, I'm afraid that I just putit aside when I removed the AvlTree implentation to replace it with somesimpler data structure.

When the issue was first created, I thought it was a problem with MINA,as we had a concurrency issue with version 2.0.0-M4. Then I bumped upthe version to 2.0.0-M5, and closed the initial issue. Sadly, after somemore tests, another issue showed up, with a different error message, andthe JIRA was reopened.

A test was provided, which was the most useful thing possible, as ithelped me to reproduce the error on my computer. We started some moredebugging session, and the problem occurred quite immediately.

But first, we have seen another problem : an index was growing withoutbound (SubLevel). We debugged the server with Kiran, and found thereason. At least, it was easy to fix, even if it wasn't related to theinitial issue. Lesson learned : when something is wrong, it's verylikely that while investigating the problem, you find some otherproblems around :). The bad news is that many parts of the server needmore thorough tests :/

While reviewing the index handling, as the problem seemed to be in thisarea (some few stackTrace helped to show that, at least : sometime oldschool debugging with printf come to the rescue... ), we also cleanedmany small things, like useless deletion of non existing elements in index.

Then we started to think about the way the server stores data on disk.Basically, you have N threads updating many index and the master table,and all those data read and write is done in a thread-safe part of theserver (Jdbm backend). As every access to JDBM is synchronized, wediscarded the idea that JDBM was buggy at this point. It narrowed theissue to what is in between the interceptor chain and JDBM.

Then we saw that the place we massage index was not thread safe. Inother words, as we have to update around 10 index when updating anentry, it must be done in a way that guarantee that the index can't befooled by another thread. This was not the case. We then thought wefound the reason of the concurrent problem. We fixed that by adding ahell of synchronized all over this part of the code.

Great success ! When the multithreaded test was launched, instead offailing after a few hundreds of updates, it was now running well forthousands of updates (I went up to 700 000 updates before getting abreakage).

That was the good news. The bad news was that the problem was stillaround, but deeper. At least, we know for sure that we removed a realconcurrent issue in the code. So what could be the problem ?

When we are storing data in indexes, we store one key associated to manyvalues (for instance, an ObjectClass can be related to many entries). Inorder to manage this, we use a ordered data structure to store thosevalues. Namely, depending on the number of values, either an AvlTree ora BTree. This AvlTree has to be serialized and deserialized in order tobe stored on disk. This is where I thought we could have an issue : thede/serialization was done outside of the synchronised part, which is nota problem per se, but we send and get back a byte[] from JDBM. What isfishy here is that JDBM caches some data, and returns a *reference* tothe data . Getting a reference on a byte[] means that some other threadcan perfectly modify this byte[] at the same time, with a potentialimpact on the expected result. It sounded like a good catch, so Imodified JDBM to get it return a copy of the byte[]. It didn't changedanything, I still get the error after a few hundred thousands updates.

Next step was to start suspecting the AvlTree implementation. I decidedthat rewriting this part was a sane idea. What a fool ... AvlTree is acomplex data structure, and writing an iterative implementation needstime, and a hell of tests. Basically, as I was stuck with no otherideas, I followed this idea, even if it was crazy. (some little voice inmy head was telling me I'm not any more a student, and this was a lostof time, but the other side of my personality told me that I was smartenough to do better than what we have ...). The bad thing about Avltrees is that documentation about it is *very* rough on the internet.There are some C implementations, but they use pointers, something wedon't have in Java, making it quite complex to implement. We also havesome specific needs, like we want to be able to move backward andforward in the tree, adding more constraints.

After around 2 weeks and a half, having a partially working AVLTreeimpelmentation, I just had a better idea. In the mean time, Kiran hasimplemented an in-jdbm serialization, removing the previous potentialproblem with byte[] copies.

So the new idea was that we don't need AvlTree, we just need a sortedarray, which was less costly when searching data into it (O(N) for asorted array, compared to a 1.44O(N) for an AvlTree). Sure, insertionand removal was more costly, as it needs an array copy, but as we don'tdo a lot of addition or removal, that was a good trade. More important,it save a lot of memory, and serialization was way easier (memory shringby 4, for references to an entry, as we have less pointers (left andright child, previous and next leaf, and the balance)). Less memoryconsumption means more cache, less garbage collection, less pointerupdate, faster read and write on disk.

I ended by writing this ArrayTree implementation, which took me threedays, with all the necessary tests, and a bit of problems with thecurrent Cursor interface (there are some semantic I really don't like inthis area). Then I just had to substitute AvlTree with ArrayTree, and voilà.

Tests went up to 2 000 000 updates, without any problem. Twice. So Iclosed the issue.

My understanding is that the previous AvlTree had a problem in the wayit managed next/previous index. I didn't spent time in analysing thecode, it was too complex. Last, not least, I added a lot of debug logsin the server, and tried to see what the logs gave. That led me togrep/sed a few gigabytes of logs, with little success. It just suckedtime, with no result except to eliminate other options.

Last, not least, what made me thing that the bug was a problem inAvlTree in some very specific cases, is that we always had a NPE in thevery same line, and that the analyze of the byte[] taken from the JDBMbackend shown that there were some missing elements in the serializedtree. As the tree was correctly serialized (I added some check in theserialize() method : the byte[] was deserialized before being stored inJDBM to be sure that the serialization was ok, and I had no problem init), that mean something was broken while rebuilding the AvlTree.

It now seems to work, but I'm not 100% sure the problem is really fixed.I think I just pushed some dust in the trash bin, expecting that it wasnot pushing them under the rug...

In any case, this part of the server needs a deep check, as I saw manypotential problems that needs to be addressed. We probably have to runthis multi-threaded test on a more powerful computer, to see if it isreally fixed.



--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

heads up about DIRSERVER-1377/1369

Reply via email to