Hi guys, week-end, so good to have time to get things done :-)
So mavibot + txn is making progress. I now have multiple writes gathered into one single update on disk. That saves a lot of writes. Typically, in a test, I start a twn, insert 5 values, and commit the txn, and I just have one new version being written, saving a global 25 page writes (5 instead of 30). All the b-trees being updated (test, BOB and CPB) have the same revisions. All the b-trees are correctly serialized on disk, although there are some optimisation to be made for the CPB content (see later) Now, I had some more thoughts on the CPB b-tree (the CopiedPagesBtree). We store a list of each pages that have been copied by a given revision, for each specific B-tree. Two things : - as this tree contains page that have been copied, and that will be removed when a specific revision will be phased out, there is no mean in keeping the original b-tree name : we don't care about ths name when it comes to move the page in the free-list page. That will save a bit of room, and we will have less elements in the b-tree - but overall, we may not need to keep this b-tree on disk. If the system crashes, the old revisions are dead anyway, and teh associated pages are going to be moved to the free-page list anyway. The only reason we have this CPB written to disk is to be able to know which pages to remove when the system restarts. A faster solution would simply be to keep this b-tree in memory (and to use a list instead of a B-tree. We always append new copied page references at the end of this list, and we always remove them from the beginning of the list. It's a small set of data, as soon as we don't keeprunning txn for ever (if 30 pages are copied during a txn, we are talking about 30*8 +8 bytes, around 256 bytes counting the reference to the next elem in the list. If we keep 1 million txn, that will be 256 Mb in memory. Unlikely to happen...). If we keep this data structure in memory, we will save some disk writes, and recovering from a crash will just be a mmater for the reclaimer to parse the entire file, pointing the unused pages, and add them in the free-list page. It should not take a lt of time, and should not block any read or writes, so it can be done in a dedicated thread. Last, not least, the free-page-list might also be kept in memory, but this is a bit more a constraint, because that would defer any writes to the moment the reclaimer has controlled all the unused pages on the disk. If we don't want to block teh writes, that will imply any update done until this check is completed will get some new pages from the end of the file, elading to some growing file. Not necessarly a big issue, but still... Anyway, I have to complete the Node serialization (I just completed the leaf), to use the txn system in all the existing tests, and to reboot the reclaimer. That will be partly for today and partly for next week-end. -- Emmanuel Lecharny Symas.com directory.apache.org
