Hi, I have looked at the code today, and I found that the way we handle the BtreeHeader is a bit complex, and does not fit some other ideas I had regarding the management of transactions.
Currently, we store a map of BH where for each BTree, we have the latest BH (ie, the one associated with the most recent revision). When we want to update a btree, or read it, we first check this map and use the returned BH to start updating or reading the BTREE. This is not good, IMO. Actually, we should always fetch the most recent revision for a given BTree from the BOB. That change the implementation of the getBtreeHeader() method. Why should we do it differently, and how does it connect with teh TXNs ? That simple (well, sort of). txns will hold in a working memory (WM) all the pages that will be updated from teh beginning to the end of the transaction, allowing us to avoid many updates on disk - currently, the way we process transaction is pretty brutal : we write teh modified pages on disk, until teh end of the txn, even if we might very well modify one of those pages -. So the 'new way' should update the pages we have in the WM. That is possible if we reference pages using their offset, but then that changes the way we process the pages (currently, we preemptively copy a page that we are going to modify). We will *not* anymore copy a page if it's present in the WM, we will just update it. At the end, teh WM will contain all the modified pages, and we will just have to write them on disc (or discard them) when we commit (or rollback) teh transaction. But the current code has only two way to fetch a page : - either it's in the cache, and we return it - or we read the page from disk (This is what the PersistedPageHolder.getValue() does) We need to add a third possibility : to get the page from the WM, when we are updating the BTree, and if it's not present in teh WM, then fetch it (from the cache or the disk) and put it into the WM. Then the update (insert or delete) must be done without creating a copy. That is a huge change in the code... But thsi is necessary if we want to have an efficient transaction handling. It also allow us to get rid of those synchronized Maps containing the BTreeHeaders. One more things (à la Apple) : we most certainly don't need to manage multiple values with sub-btrees in Mavibot : As soon as we have a fully working transaction system, we could perfectly expect the application to deal with such a specific case : all in all, in a Btree<K, V>, where V is the user's data structure, it's up to the user to make V a BTree, and to deal with it. As we will have a cross- b-tree transaction system, it won't be expensive, plus this is already what we do with JDBM, so the ApacheDS code will not be difficult to port. A bit of work in our plates ;-) Thoughts ?
