[Mavibot] Transaction support

Emmanuel Lécharny Thu, 29 Oct 2015 03:15:11 -0700

Hi guys,

here are a bit of my last night's insomnia thoughts (with the shower,
and the smallest room of my appartment, this is the only place and time
I have for such thoughts atm ;-) :


- txn are now going to be explicit, ie you have to begin a txn and
commit it before and after any update, like in :

        // Inject some data
        recordManager.beginTransaction();
        btree.insert( 1L, "1" );
        btree.insert( 4L, "4" );
        btree.insert( 2L, "2" );
        btree.insert( 3L, "3" );
        btree.insert( 5L, "5" );
        recordManager.commit();

- later on, we may include the txn begin/commit into the update
operation, on a btree basis.
- the direct consequence is that we don't write any thing on the disk
before we commit. This will have a huge impact on performances as every
single update operation involve many internal updates : BOB Btree, CPB
Btree, RMH. Typically, in the previous example, we will save 5 updates
of teh BOB and CPB, plus numerous updates on the RMH, not counting the 4
updates of the btree root page and btree header.
- of course, this is not coming for free : if one forget to commit or
rollback the txn, then the pages will remain in memory...
- in order to manipulate those pages in memory, we will keep a track of
the update pages and btree headers in memory. We use Map for that :
    Map<Long, PageIO> and Map<Long, Page>
- now, we have two kind of map : one that keep a track on PageIO and the
other one that keep a track on Page. Why is that ? Because the RMH and
the BHs are not holding any value, they are specific data structures.
Every other Page hold  Keys/Values, where value sare data or pointer to
a child Page.
- we need a way to reference those Page and PageIO. For PageIO, this is
easy : we use the page offset. For plain Page, this is a different
story, because we may create new pages (actually, this *is* what we do
for every update, except that as we work in memory during a txn, we can
*modify* an existing page instead of copy it, until we need to write it
down on disk. Note that the very first time we access to a Page, either
from teh disk or from the cache, we have to copy it, no matter what).
- Those new pages don't have an offset, because we haven't written them
to disk yet. The reason is that writing such a page to the disk means we
have serialized the page - an expensive operation we want to avoid until
we have to do it, ie when we write the page to the disk -, and we have
fetched as many PageIO as necessary to write the page to the disk
(fetching a new PageIO will first look in the Free Page list or grab new
PageIO at the end of the file, expending the file).
- So we have to find a way to identify those pages by other means. One
idea is to use an incremental ID associated with the txn. Every Page
will have a Long ID.
- now, the pb is that we use offset in pages to reference child pages.
We must be able to distinguish an offset from an ID. We will use a
negative long for PageID.
- when we commit the txn, we will have to update all the references to
IDs to replace them with offsets. At this point, the easiest way is to
read all the Nodes we keep in the Map, and check all of their values to
see if we are refering to an ID instead of an Offset, and change that.
- that drives the order in which we write the pages : we have to start
from the leaves, and up to the root page
- another approach would be to keep a track of the modified values for
each Node (ie, value X has been change is now reffer to page Y, etc...).
This will be memory consuming, but CPU efficient. All in all, it's
another Map to handle : Map<PageID, Set<Modified Value position>>. This
approach is probably what we want to use...
- modifying a page instead of copying it requires we add new methods in
the Btree class. Not a huge change.
- a roolback is just a matter of deleting the txn context: straightforward !

At this point, I think we have a roadmap for txn support. I said nothing
about managing free pages here, it has already been discussed in another
thread, but I'll come later with a more explicit mail.

I'm curently working in a branch, in which I have removed the InMemory
Btrees and the support for multiple values. The branch has been created,
but I haven't commit yet anything in it : I have too many broken things.
I'll commit asap (may be this week-end if I have time to work on my free
time - familly comes first !)

Thanks !


BOB : Btree-Of-Btrees, the Btree that hold the revisions in use
CPB : Copied-Pages-Btree, the btree that holds teh list of pages that
have been copied during an update
RMH : RecordManager Header, the data structure that maintains the
pointers to the BOB and CPB, both the current and previous, plus some
extra informations, like the FreeList page
BH :  Btree Header, the structure that stors meta-information about a
B-tree (serializers, comparators, nb elements, pointer to the root page,
etc)

[Mavibot] Transaction support

Reply via email to