Re: [Mavibot] Transaction support

Emmanuel Lécharny Thu, 29 Oct 2015 23:32:00 -0700

Le 30/10/15 04:51, Kiran Ayyagari a écrit :
> On Thu, Oct 29, 2015 at 6:13 PM, Emmanuel Lécharny <[email protected]>
> wrote:
>
>> Hi guys,
>>
>> here are a bit of my last night's insomnia thoughts (with the shower,
>> and the smallest room of my appartment, this is the only place and time
>> I have for such thoughts atm ;-) :
>>
>> - txn are now going to be explicit, ie you have to begin a txn and
>> commit it before and after any update, like in :
>>
>>         // Inject some data
>>         recordManager.beginTransaction();
>>         btree.insert( 1L, "1" );
>>         btree.insert( 4L, "4" );
>>         btree.insert( 2L, "2" );
>>         btree.insert( 3L, "3" );
>>         btree.insert( 5L, "5" );
>>         recordManager.commit();
>>
> if a user forgets to call commit() and keeps inserting we will run out of
> memory at some point
> so wondering what is the better way to prevent that from happening without
> maintaining a
> log (a WAL or something similar)


I don't really see how to do that, sadly... In some ways, we are really
in a similar situatio as when you were managing memory in C : if you
forget to call free(), you have a leak...
>
>> - later on, we may include the txn begin/commit into the update
>> operation, on a btree basis.
>> - the direct consequence is that we don't write any thing on the disk
>> before we commit. This will have a huge impact on performances as every
>> single update operation involve many internal updates : BOB Btree, CPB
>> Btree, RMH. Typically, in the previous example, we will save 5 updates
>> of teh BOB and CPB, plus numerous updates on the RMH, not counting the 4
>> updates of the btree root page and btree header.
>> - of course, this is not coming for free : if one forget to commit or
>> rollback the txn, then the pages will remain in memory...
>> - in order to manipulate those pages in memory, we will keep a track of
>> the update pages and btree headers in memory. We use Map for that :
>>     Map<Long, PageIO> and Map<Long, Page>
>> - now, we have two kind of map : one that keep a track on PageIO and the
>> other one that keep a track on Page. Why is that ? Because the RMH and
>> the BHs are not holding any value, they are specific data structures.
>> Every other Page hold  Keys/Values, where value sare data or pointer to
>> a child Page.
>> - we need a way to reference those Page and PageIO. For PageIO, this is
>> easy : we use the page offset. For plain Page, this is a different
>> story, because we may create new pages (actually, this *is* what we do
>> for every update, except that as we work in memory during a txn, we can
>> *modify* an existing page instead of copy it, until we need to write it
>> down on disk. Note that the very first time we access to a Page, either
>> from teh disk or from the cache, we have to copy it, no matter what).
>> - Those new pages don't have an offset, because we haven't written them
>> to disk yet. The reason is that writing such a page to the disk means we
>> have serialized the page - an expensive operation we want to avoid until
>> we have to do it, ie when we write the page to the disk -, and we have
>> fetched as many PageIO as necessary to write the page to the disk
>> (fetching a new PageIO will first look in the Free Page list or grab new
>> PageIO at the end of the file, expending the file).
>> - So we have to find a way to identify those pages by other means. One
>> idea is to use an incremental ID associated with the txn. Every Page
>> will have a Long ID.
>> - now, the pb is that we use offset in pages to reference child pages.
>> We must be able to distinguish an offset from an ID. We will use a
>> negative long for PageID.
>> - when we commit the txn, we will have to update all the references to
>> IDs to replace them with offsets. At this point, the easiest way is to
>> read all the Nodes we keep in the Map, and check all of their values to
>> see if we are refering to an ID instead of an Offset, and change that.
>> - that drives the order in which we write the pages : we have to start
>> from the leaves, and up to the root page
>> - another approach would be to keep a track of the modified values for
>> each Node (ie, value X has been change is now reffer to page Y, etc...).
>> This will be memory consuming, but CPU efficient. All in all, it's
>> another Map to handle : Map<PageID, Set<Modified Value position>>. This
>> approach is probably what we want to use...
>> - modifying a page instead of copying it requires we add new methods in
>> the Btree class. Not a huge change.
>> - a roolback is just a matter of deleting the txn context: straightforward
>> !
>>
>> At this point, I think we have a roadmap for txn support. I said nothing
>> about managing free pages here, it has already been discussed in another
>> thread, but I'll come later with a more explicit mail.
>>
>> I'm curently working in a branch, in which I have removed the InMemory
>> Btrees and the support for multiple values. The branch has been created,
>> but I haven't commit yet anything in it : I have too many broken things.
>> I'll commit asap (may be this week-end if I have time to work on my free
>> time - familly comes first !)
>>
> please commit it even if it is broken, I might be able to understand the
> above ideas
> if I read it again after looking at code
>
> thanks Emmanuel
>
>> Thanks !
>>
>> BOB : Btree-Of-Btrees, the Btree that hold the revisions in use
>> CPB : Copied-Pages-Btree, the btree that holds teh list of pages that
>> have been copied during an update
>> RMH : RecordManager Header, the data structure that maintains the
>> pointers to the BOB and CPB, both the current and previous, plus some
>> extra informations, like the FreeList page
>> BH :  Btree Header, the structure that stors meta-information about a
>> B-tree (serializers, comparators, nb elements, pointer to the root page,
>> etc)
>>
>>
>

Re: [Mavibot] Transaction support

Reply via email to