Hi guys, I have been pretty busy those last 2 weeks working on Fortress integration and on the LDAP API release (beside other things).
As of today, the mavibot bulkoader, which has been put aside for days with a few pieces of the algorithm to implement, is working. I have tested it with 1 000 000 elements, and it's all good. (It also has been tested with many btrees with incremental sizes to be sure we don't forget a few corner cases). The goals where : - have the bulkloader part of Mavibot, instead of having it in Mavibot-Partition only. - have it accept as many entries as possible. The previous implementation was not capable to handle millions of elements - have it use a limited amoubt of memory : we now keep only one page per btree level In order to cope with the potential huge number of elements we have to sort before loading them in the btree, we use temporary files, which can hold N sorted elements. Then we do a kind of merge-sort, by pulling the right element from one of the files. One can configurate the number of elements in each file, assuming we sort them in memory. On my tests, above 16 384 elements per file, I don't see a huge improvement. The perf tests I have done on my laptop show that I can load up to 56 600 tuples per second. Don't expect the same performances when it will come to load LDAP entries in the server ! I suspect that it will be 10 times slower (still, 5 000 entries per second added would be a great imrporvement over what we have now). There are some steps that need to be fulfilled still : - multi-values support : it's all aboyt bulkloading the values when we have many. Should be easy to implement. - use the bulkloader in ApacheDS mavibot-partition - add a CLI for the mavibot bulkloader and the mavibot-partition bulkloader. - add a in-memory bulkloader. - cleanup the code which has many redonduncies atm. Anyway, it's making progress. I'll probably cut a mavibot release tomorrow, which will allow me to cut an ApacheDS release too. thanks !
