On Mon, Sep 14, 2015 at 5:44 AM, Emmanuel Lécharny <[email protected]> wrote:
> Hi, > > I have looked at the code today, and I found that the way we handle the > BtreeHeader is a bit complex, and does not fit some other ideas I had > regarding the management of transactions. > > Currently, we store a map of BH where for each BTree, we have the latest > BH (ie, the one associated with the most recent revision). When we want > to update a btree, or read it, we first check this map and use the > returned BH to start updating or reading the BTREE. > > This is not good, IMO. > > Actually, we should always fetch the most recent revision for a given > BTree from the BOB. That change the implementation of the > getBtreeHeader() method. > > Why should we do it differently, and how does it connect with teh TXNs ? > That simple (well, sort of). txns will hold in a working memory (WM) all > the pages that will be updated from teh beginning to the end of the > transaction, allowing us to avoid many updates on disk - currently, the > way we process transaction is pretty brutal : we write teh modified > pages on disk, until teh end of the txn, even if we might very well > modify one of those pages -. > > So the 'new way' should update the pages we have in the WM. That is > possible if we reference pages using their offset, but then that changes > the way we process the pages (currently, we preemptively copy a page > that we are going to modify). We will *not* anymore copy a page if it's > present in the WM, we will just update it. At the end, teh WM will > contain all the modified pages, and we will just have to write them on > disc (or discard them) when we commit (or rollback) teh transaction. > > But the current code has only two way to fetch a page : > - either it's in the cache, and we return it > - or we read the page from disk > (This is what the PersistedPageHolder.getValue() does) > > We need to add a third possibility : to get the page from the WM, when > we are updating the BTree, and if it's not present in teh WM, then fetch > it (from the cache or the disk) and put it into the WM. > Then the update (insert or delete) must be done without creating a copy. > > That is a huge change in the code... But thsi is necessary if we want to > have an efficient transaction handling. It also allow us to get rid of > those synchronized Maps containing the BTreeHeaders. > > One more things (à la Apple) : we most certainly don't need to manage > multiple values with sub-btrees in Mavibot : As soon as we have a fully > working transaction system, we could perfectly expect the application to > deal with such a specific case : all in all, in a Btree<K, V>, where V > is the user's data structure, it's up to the user to make V a BTree, and > to deal with it. As we will have a cross- b-tree transaction system, it > won't be expensive, plus this is already what we do with JDBM, so the > ApacheDS code will not be difficult to port. > we should move to explicit begin() and commit() to support the cross Btree transactions, this will impact ApacheDS code a bit cause now we need a txn handle to pass around > > A bit of work in our plates ;-) > yep > > Thoughts ? > > -- Kiran Ayyagari http://keydap.com
