Philip Bohannon wrote some time back:
The classic problem here is a split taking place during a search.  Say  a
page C contains 50-100, and C's parent in the tree is P, which has an
entry for 50 and 101, with the entry for 50 pointing to C.

Now an inserter comes in, inserts 51 and fills up C (or fills up C's
child, causes a split, and that fills up C).  So C splits into C and C',
which is added to the right of C.

Since the inserter can't get a lock on P while holding a lock on C
(deadlock), it releases the lock on P.  At this point, a reader comes down
the tree looking for 99, looks at P and gets a pointer to C not C'. Now
inserter starts waiting for a lock on P.  When reader gets C, 99 has been
moved off to C'.

It would be interesting to know how Derby handles this (for example, [Yao]
proposes having a pointer from C to C', and I forget what Aries IM does at
the moment, but I think it starts over at the top if you fail to find a
key at the rightmost point in the page...).

Philip,

I don't fully understand how Derby works, but from what I have seen so far, I think that Derby avoids above situation as follows:

It latches the parent P first and then the child C exclusively - and only then starts the split. The searcher has to wait until the split is over.

In Mike's words:

[In order to prevent deadlocks latches requested while holding other latches are always requested top/down and left to right. Btree splits are always left to right. If for any reason the code needs to break this protocol then it will first request the latch NOWAIT and if it can't get the latch it will release all current latches and wait for the latch it is trying to get, and then after obtaining it go about getting the other latches it needs for the particular operation. While traversing down the tree Derby may hold 2 latches: one on parent and one on child.]

The question is what happens if parent P is also full and needs to be split. This is where things get really complicated. If my understanding is correct, Derby solves this in a simple and elegant way.

It releases the latches on C and P, and starts the split at P with the separator key. So, at that point, P becomes C, and P's parent, let us say Q, becomes P.

I think that this is repeated recursively up the tree until the original P has been split with room to insert the separator key. Then Derby proceeds with the original split.

Perhaps Mike or someone else can confirm or fill in with more details.

I have not seen above algorithm in any of the papers I have read - but then, my reading is limited to following papers:

C.Mohan and Frank Levine. ARIES/IM: an efficient and high concurrency index management method with write-ahead logging. ACM SIGMOD Record. V21 N2, P371-380, June 1 1992.

Jim Gray and Andreas Reuter. Chapter 15: Access Paths. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers, 1993

Philip L. Lehman and S.Bing Yao. Efficient Locking for Concurrent Operations on B-Trees. Readings in Database Systems, Third Edition, 1998. Morgan Kaufmann Publishers.

David Lomet and Betty Salzburg. Access method concurrency with recovery. ACM SIGMOD, V21 N2, p351-360, June 1 1992.


Regards

Dibyendu



Reply via email to