On Mon, Sep 14, 2015 at 4:31 PM, Emmanuel Lécharny <[email protected]> wrote:
> Le 14/09/15 04:40, Kiran Ayyagari a écrit : > > On Mon, Sep 14, 2015 at 5:44 AM, Emmanuel Lécharny <[email protected]> > > wrote: > > > >> Hi, > >> > >> I have looked at the code today, and I found that the way we handle the > >> BtreeHeader is a bit complex, and does not fit some other ideas I had > >> regarding the management of transactions. > >> > >> Currently, we store a map of BH where for each BTree, we have the latest > >> BH (ie, the one associated with the most recent revision). When we want > >> to update a btree, or read it, we first check this map and use the > >> returned BH to start updating or reading the BTREE. > >> > >> This is not good, IMO. > >> > >> Actually, we should always fetch the most recent revision for a given > >> BTree from the BOB. That change the implementation of the > >> getBtreeHeader() method. > >> > >> Why should we do it differently, and how does it connect with teh TXNs ? > >> That simple (well, sort of). txns will hold in a working memory (WM) all > >> the pages that will be updated from teh beginning to the end of the > >> transaction, allowing us to avoid many updates on disk - currently, the > >> way we process transaction is pretty brutal : we write teh modified > >> pages on disk, until teh end of the txn, even if we might very well > >> modify one of those pages -. > >> > >> So the 'new way' should update the pages we have in the WM. That is > >> possible if we reference pages using their offset, but then that changes > >> the way we process the pages (currently, we preemptively copy a page > >> that we are going to modify). We will *not* anymore copy a page if it's > >> present in the WM, we will just update it. At the end, teh WM will > >> contain all the modified pages, and we will just have to write them on > >> disc (or discard them) when we commit (or rollback) teh transaction. > >> > >> But the current code has only two way to fetch a page : > >> - either it's in the cache, and we return it > >> - or we read the page from disk > >> (This is what the PersistedPageHolder.getValue() does) > >> > >> We need to add a third possibility : to get the page from the WM, when > >> we are updating the BTree, and if it's not present in teh WM, then fetch > >> it (from the cache or the disk) and put it into the WM. > >> Then the update (insert or delete) must be done without creating a copy. > >> > >> That is a huge change in the code... But thsi is necessary if we want to > >> have an efficient transaction handling. It also allow us to get rid of > >> those synchronized Maps containing the BTreeHeaders. > >> > >> One more things (à la Apple) : we most certainly don't need to manage > >> multiple values with sub-btrees in Mavibot : As soon as we have a fully > >> working transaction system, we could perfectly expect the application to > >> deal with such a specific case : all in all, in a Btree<K, V>, where V > >> is the user's data structure, it's up to the user to make V a BTree, and > >> to deal with it. As we will have a cross- b-tree transaction system, it > >> won't be expensive, plus this is already what we do with JDBM, so the > >> ApacheDS code will not be difficult to port. > >> > > we should move to explicit begin() and commit() to support the cross > Btree > > transactions, > > Absolutely. The current transaction support is less than half baked, it > only support per-BTree transaction, which is pretty much useless. > > Actually, we do need to pass a context to each Btree operation, context > that holds the revision of each B-trees being part of the transaction. > That also mean we have one common revision for each transaction, > revision which ties all the Btrees' revisions that were part of the > transaction. > > Typically, in a ApacheDS context, when we update an entry, we nt only > update the master BTree, but the RDN btree and every btree associated > with each index. And all those updates must be visible as a whole when > we want to fetch an entry or an index for this revision. > > the solution would be to use the transaction ID (which will be > incremental) as the revision number for all the b-trees being updated > during that transaction. That saves us the pain of keeping a track of > the various B-trees revisions when applying an operation to a given > revision of a b-tree. > > +1 that is a good idea -- Kiran Ayyagari http://keydap.com
