Let's describe the algorithms we will implement.
BTree methods
Insert
When we try to insert a tuple <K, V> into a tree, we start from the root page, which can be either a Leaf or a Node. In any case, we call the insert(K, V) method on the BTree, which will get a new revision that will become the new root if the insertion succeed.
BTree.insert(key, value) : (K, V) -> V
get new revision
return insert(revision, key, value)
This internal method will try to insert the tuple in the root page, and depending on the result, will either switch the root page to the newly created page, or o create a new root page containing the pivot if the old root page has been split
BTree.insert(revision, key, value) : (long, K, V) -> V
currentRoot <- get the current root page
result <- currentRoot.insert(revision, key, value)
oldValue <- null
newRoot <- null
if result is a modified page
then
newRoot <- result.modifiedPage
oldValue <- result.oldValue
else newRoot <- new Node(revision)
newRoot.children[0] <- result.getLeftPage
newRoot.children[1] <- result.getRightPage
newRoot.keys[0] <- result.getPivot
roots.put(<revision, newRoot>)
currentRoot <- newRoot
return oldValue
Delete
Find
This method is used to find a specific value, given a key.
We will have to go down the tree up to the leaf that contains the value, if the key is present in the tree. We will work on the latest revision available.
Browse
Page methods
Node.insert
Leaf.insert
This method is terminal (it won't go down one level like the same method in the Node class), as we are now to insert the value in the page.
We have three cases to deal with here :
- The key already exists in the page
- The key does not exist but can be added in the page
- The key does not exist and the page is full
Let's see what it means for each case
- We just have to copy the page, changing its revision, replace the value, and reference the previous and next links. We also return the modified page and the old value
- This is a bit more complex : we have to insert the key and the value at the right place in a copy of the page, then we will change the revision, and update the references to the previous and next pages
- Even more complex : the page will be split, a new pivot will be computed and send back to the caller.
Here is the algorithm :
Leaf.insert(revision, key, value) : (long, K, V) -> InsertResult<K, V>
pos <- findPos(key)
if pos < 0 then
index <- -(pos + 1)
return replaceElement(revision, key, value, index)
else
if nbElems < BTree.pageSize
then return addElement(revision, key, value, pos)
else return page.addAndSplit(revision, key, value, pos)
This helper method is used to implement the first case :
Leaf.replaceElement(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
newLeaf <- new Leaf( btree, revision, nbElems );
copy keys[0..nbElems] to newLeaf.keys[0..nbElems]
copy values[0..nbElems] to newLeaf.values[0..nbElems]
oldValue <- newLeaf.values[index]
newLeaf.values[index] <- value
newLeaf.prev <- this.prev
newLeaf.next <- this.next
InsertResult<K, V> result <- new ModifyResult()
result.modifiedValue = oldValue
result.modifiedPage = newLeaf
return result
This helper class is used to manage the case 2 :
Leaf.addElement(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
newLeaf <- new Leaf( btree, revision, nbElems + 1 );
copy keys[0..pos] to newLeaf.keys[0..pos]
newLeaf.keys[pos] <- key
copy keys[pos..nbElems] to newLeaf.keys[pos+1..nbElems+1]
copy values[0..pos] to newLeaf.values[0..pos]
newLeaf.values[pos] <- value
copy values[pos..nbElems] to newLeaf.values[pos+1..nbElems+1]
newLeaf.prev <- this.prev
newLeaf.next <- this.next
InsertResult<K, V> result <- new ModifyResult()
result.modifiedValue = null
result.modifiedPage = newLeaf
return result
This helper method is used to create the split pages :
Leaf.addAndSplit(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
middle <- nbPageSize / 2
leftPage <- null
rightPage <- null
if pos < middle
then
leftPage <- new Leaf(btree, revision, middle + 1)
copy keys[0..pos] to leftPage.keys[0..pos]
leftPage.keys[pos] <- key
copy keys[pos..middle] to leftPage.keys[pos+1..middle+1]
copy values[0..pos] to leftPage.values[0..pos]
leftPage.values[pos] <- value
copy values[pos..middle] to leftPage.values[pos+1..middle+1]
rightPage <- new Leaf(btree, revision, middle)
copy keys[middle..nbElems] to rightPage.keys[0..middle]
copy values[middle..nbElems] to rightPage.values[0..middle]
else
leftPage <- new Leaf(btree, revision, middle)
copy keys[0..middle] to leftPage.keys[0..middle]
copy values[0..middle] to leftPage.values[0..middle]
rightPage <- new Leaf(btree, revision, middle + 1)
copy keys[middle..pos] to rightPage.keys[0..pos]
rightPage.keys[pos] <- key
copy keys[pos..nbElems] to rightPage.keys[pos+1..middle+1]
copy values[middle..pos] to rightPage.values[0..pos]
rightPage.values[pos] <- value
copy values[pos..nbElems] to rightPage.values[pos+1..middle+1]
leftPage.prev <- this.prev
leftPage.next <- rightPag
rightPage.prev <- leftPage
rightPage.next <- this.next
newPivot <- rightPage.keys[0]
InsertResult<K, V> result <- new SplitResult()
result.newPivot <- newPivot
result.leftPage <- leftPage
result.rightPage <- rightPage
return result
AbstractPage.findPos
This method is used to find the position of a key in the keys. If the key is found, we will return its position as a negative value. Otherwise, we will return the position of the key which is immediately above the key we are looking for, or the latest position in the page if the new key is higher than any existing keys.
We try to make the minimal number of comparisons while searching for the key, as it's a costly operation. If we have a tree with 1 million of elements, and pages containing 16 elements, with pages 66% full, then we will have to go down 6 levels, with around 4 comparisons per page, so around 24 comparisons for each find.
Page.findPos(key) : (K) -> int
min <- 0
max <- nbElems - 1
while max > min
do
middle <- ( max + min + 1) / 2
if key == keys[middle]
then
return - (middle + 1)
if key > keys[middle]
then
min <- middle + 1
else
max <- middle - 1
done
if key < keys[max]
then
return max
else
if key == keys[max]
then
return - (max + 1)
else
return max + 1
There is a small improvment done in this algorithm : as we may find the keys before we have gone down to atomic elements, we can save some comparaison by excluding the element we just compared.