On 12/26/2017 09:41 AM, Emmanuel Lécharny wrote: > Hi guys, > > last september, I worked on Mavibot in order to add transactio into it. > It now works. > > Transactions bring many advantages to the backend : > - it guarantees atmicity at an upper level (ie, cross B-tree). This is > critical for the server, as a simple update (ADD/Modify/Delete/ModRDN) > impact more than one B-tree > - we can gather many updates per transaction, which can be use either to > speed up updates if the user don't really care about losing a bit of > them - commit can be done every N updates or every M minutes, for > instanc - or to inject huge amount of data (like what we would do for a > bulkload). This second usage seems good to have, no matter what. FTR, > injecting 100 000 elements in a <int, String> B-tree takes 0,5 seconds > on my laptop. We can imagine that even if it takes 100 times more > processing to inject LDAP entries, that would mean 50 seconds to load > 100 000 entries. You can compare that with the current 100 entries/s we > can get with JDBM... (20 times slower...) > > Ok, anyway, we all know that Mavibot is really badly needed. > > The thing I have in mind atm is that even if Mavibot is not completed > (free pages aren't managed, dead versions aren't removed), we can still > benefit from it. > > The fact that we don't clean up dead revisions is not necessarily > critical : we can turn that on a feature, the hability we would have to > fetch data at a given revision/date. In a system where audit is > critical, that would be a plus. > One of the biggest issue with keeping all the revisions is that the > database will grow fast, but we can mitigate this : > - first with transaction, we can limit this growth in two ways : we > don't update the management B-tree more than once, instead of once per > B-tree update, so even if we don't deffer transaction commits, we still > limit the growth rate > - second we can deffer commits, as I explain upper. > > It's not perfect, but if we consider a 100 000 entries database, with 20 > indexes, that would mean around 5 * 20 * 1024 * N bytes added for N > updates per day (20 indexes updated, 5 level updated, 1024 bytes pages). > If N is 100, which is conservative for a LDAP server. This is adding > 10Mb/day to the database, less than 4Gb/year. Ok, I know this is a back > of the envelop calculation, but that gives an idea. > > In any case, I do think it makes sense to offer this option to our > users, who are suffering for years from JDBM data corruption. > > Here is what I would propose : > - add the Mavibot V2 backend, as is, with all the pros and cons > - implement LDAP transaction as specified in RFC 5805. This will be used > for batched updates (kind of bulkload) > - keep JDBM as is > - add a system to shrink the database (either offline or on line, see at > teh end of this mal)
Is implementation of LDAP transactions required? I guess that is just an addtional feature on top of Mavibot. I mean I guess one can just use Mavibot and benefit from non-corrupt database without having transactions, right? > The biggest issue with keeping JDBM is that we will have to keep some of > the locks we have added, so getting rid of them for Mavibot might be a > bit of a pain. With "keep JDBM" you mean the user can decide via configuration if one want to use JDBM or Mavibot? What do you suggest as default? > In the longer term, I will implement free page management/ old version > removal in Mavibot, removing the growing database issue. > > Regarding the Mavibot shrinking tool, I have some ideas about it. > Basically, we need to be able to get rid of dead revisions (ie, revision > we know are not anymore in use). Removing old revisions is easy, teh > problem is to be sure they aren't used anymore. Doing so on a offline > database is trivial : we can delete all of them without risking losing > anything. > On a online database, that means we have to keep a track on alive read > transaction. We can safely delete all the revisions that are older than > the oldest used read transaction. That being said, shrinking the > database while it's in use is just a matter of blocking every writes, > create a new database and inject the latest version into it (which may > take a bit of time). Once done, we switch the database for new > operations - but keep the old one for ongoing operations - and when no > operation is using the old database anymre, we can delete it. This is > slightly more complex, so I'm not sure it worth the effort, and I'd > rather spend my limited time on adding teh free Pages/Old version > management in Mavibot. I agree, better spend time to implement the "right" thing than spending time on workarounds.
