Hi guys, as the dev mailing list and commits number are both low those last weeks, I'd like to give you a bit of heads up about what's going on.
First of all, we have cut a release recetnrly (2.0.0-M11), 2 months after having cut M10. Many issues have been fixed, most of them Kerberos related, plus some serious bug fixes in the backend. This resulted to a server which is *very* slow when injecting some modifications (I can add around 80 entries per second on my laptop, but this all depends on the number of index you have. The more index you add, the slower it will be). Those changes were absolutely necessary to guarantee that we don't have conflicts when we inject entries while we search the tree, and in order to be able to start back the server in good shape after a crash (something that was broken in M10). To do that, we had to use transaction in JDBM, plus to serialize write *and* reads (in other words, when doing a write, you can't search, and when doing a read, you can't write). At this point, we have a working server, which is pretty fast as soon as you don't do massive modifications into it. But we simply can't accept the sluggishness of the backend when applying modification. Being able to inject 80 entries per second is simply not acceptable : injecting 1M entries would take more or less 4 hours ! It will not only affect the initial injection of data, it can be a real problem when dealing with replication, where a lot of modifications can be replicated from server to server. So we started to work on an alternative solution for the backend. Last year, a lab [1][2] have been created to develop a MVCC Btree that could be used as a replacement for JDBM. The MVCC BTree is working in memory, with pretty good performances (800 000 addition per second, 15 000 000 searches per second), but as it said, it's in-memory (even though it's persisted on disk). Kiran has started to work on writing a partition for this backend, but we lack a few things to get it working efficiently : basically, we need to be able to store multiple values associated with a key. On my side, I have started to work on a RecordManager that will allow the BTree to be backed on disk, and to allow the pages to be discarded from memory when we don't have enough memory (we are using SoftReferences for that). Its an on-going work, which will take some time (one or two months). Kiran is also working on a bulk-load utility that will speed up the load of huge amount of data. All this hard work keep us far from the project atm, but this is temporary. We expect to have a working version soon enough. In the mean time, if you are interested in this ongoing work, please feel free to check the lab, it's available to all apache committers. the [email protected] ML will be used to communicate, assuming mails' subject is prefixed with [mavibot]. FYI, Mavibot comes from MVBT (Multi-Version B-Tree), and the name itself was found by Ersin Er : it means Blue Boat in Turkish. This is really a great name ! [1] http://svn.apache.org/repos/asf/labs/mavibot/ [2] https://cwiki.apache.org/confluence/display/labs/Mavibot -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
