Re: [Mavibot] Write Transaction support

Howard Chu Wed, 11 Dec 2013 14:39:25 -0800

Emmanuel Lécharny wrote:

Hi guys,


while 1.0.0-M3 is being voted, I started to work on the next iteration.
Kiran is also working on the bulk load tool at the same time, so I will
work on the transaction support.

Updating the bTree is a fairly costly operation, as we have to copy many
pages, reclaim some free pages to store new data, and update the header
many times. We could save many of those writes if we do it at the end of
the update. One of the biggest save would be to write the header only
once (currently, it's being updated every time we ask for a free page).

But even if we implement such a mechanism, there are cases where we
would like to save more. Here are two possible use cases :
- we are injecting many entries in a BTree
- we are updating many BTrees but we want a consistant backend even if
there is a crash in the middle of all those updates, across oall the
updated BTrees (this is typically what we have to guarantee when
updating entries in LDAP).

So where does it bring us ? We need to avoid writing to disk as much as
we can up to the point we can do it safely. The idea is to copy the
pages in memory, without flushing them on disk, until we commit (or
rollback) the operation. If we have to copy the same page more than
once, then we will update the copy in memory. In any case, we operate on
a given revision for all the operation, regardless the number of updates.

When the user is done, we have two use cases :
- a commit is called :
we flush on disk the pages we have stored in memory, and when it's done,
we update the header. We are all good.
- a rollback is called :
there is nothing else to do than to discard all the pages we copied in
memory

The only pb is to manage the free pages and the deleted pages. The free
pages will be reclaimed using the existing list of free pages, and
eventually from the end of the file if we don't have enough free pages.
This is quite easy. The deleted pages is more complicated to deal with,
as we may have many pages we will free. But again, those deleted pages
can be put into the free page list. Worst case is if we can't update the
copiedPage BTree : we may lose some pages (but that would not corrupt
the backend).

This is what I foresee, I need to go a bit deeper in the algorithm, but
I don't think I'm too far from something that flies.

LMDB already does these steps. The only thing that's actually critical toproper operation is to defer the header update to the end of the commit, sokeeping a copy of this page in memory is critical. Then commit or rollback issimply a matter of writing the header back to disk or not, no other pagemanipulations will be visible without the header update. The other thing is tomake sure you're only using free pages that can't be seen by any activetransactions.


ATM, I will probaby work on teh insert() operation only, and on a BTree
only.

Thoughts ?



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: [Mavibot] Write Transaction support

Reply via email to