[CONF] Apache Labs > Algorithms

confluence Fri, 15 Jun 2012 08:28:26 -0700

     Algorithms
    Page edited by             Emmanuel Lécharny
    
        

                         Changes (4)
                                 
    

                    
    
            ...

                  if nbElems < BTree.pageSize 
        then // The page is not full, case 2 

            return insertElement(revision, addElement(revision, key, value, pos) 

                    else // The page is full, case 3 

            return page.splitAndAdd(revision, page.addAndSplit(revision, key, value, pos) 

            {code} 
 

            ...

            This helper class is used to manage the case 2 : 
 

            {code:title=InsertResult<K, V> insertElement(long, addElement(long, K, V, int)|borderStyle=solid} 

            Leaf.insertElement(revision, Leaf.addElement(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V> 

              newLeaf <- new Leaf( btree, revision, nbElems + 1 ); 
 

            ...

    
            
    
                            Full Content
                    
        Let's describe the algorithms we will implement.

BTree methods
Insert

When we try to insert a tuple <K, V> into a tree, we start from the root page, which can be either a Leaf or a Node. In any case, we call the insert(K, V) method on the BTree, which will get a new revision that will become the new root if the insertion succeed.

void insert(K, V)

BTree.insert(key, value) : (K, V) -> V
  get new revision
  return insert(revision, key, value)


This internal method will try to insert the tuple in the root page, and depending on the result, will either switch the root page to the newly created page, or o create a new root page containing the pivot if the old root page has been split

void insert(revision, K, V)

BTree.insert(revision, key, value) : (long, K, V) -> V
  currentRoot <- get the current root page
  result <- currentRoot.insert(revision, key, value)
  oldValue <- null
  newRoot <- null

  if result is a modified page
    then 
      newRoot <- result.modifiedPage
      oldValue <- result.oldValue
    else // It's a split page
      newRoot <- new Node(revision)
      newRoot.children[0] <- result.getLeftPage
      newRoot.children[1] <- result.getRightPage
      newRoot.keys[0] <- result.getPivot

  roots.put(<revision, newRoot>)
  currentRoot <- newRoot
  return oldValue



Delete

V delete(K)





Find

This method is used to find a specific value, given a key. 

We will have to go down the tree up to the leaf that contains the value, if the key is present in the tree. We will work on the latest revision available.

V find(K)




Browse

cursor<V> browse(K)




Page methods

Node.insert

Leaf.insert

This method is terminal (it won't go down one level like the same method in the Node class), as we are now to insert the value in the page.

We have three cases to deal with here :

	The key already exists in the page
	The key does not exist but can be added in the page
	The key does not exist and the page is full



Let's see what it means for each case


	We just have to copy the page, changing its revision, replace the value, and reference the previous and next links. We also return the modified page and the old value
	This is a bit more complex : we have to insert the key and the value at the right place in a copy of the page, then we will change the revision, and update the references to the previous and next pages
	Even more complex : the page will be split, a new pivot will be computed and send back to the caller.



Here is the algorithm :

InsertResult<K, V> insert( long, K, V )

Leaf.insert(revision, key, value) : (long, K, V) -> InsertResult<K, V>
  pos <- findPos(key)

  if pos < 0 // The key is present in the page, case 1
    then
      index <- -(pos + 1)
      return replaceElement(revision, key, value, index)
    else
      if nbElems < BTree.pageSize
        then // The page is not full, case 2
          return addElement(revision, key, value, pos)
        else // The page is full, case 3
          return page.addAndSplit(revision, key, value, pos)



This helper method is used to implement the first case :

InsertResult<K, V> replaceElement(long, K, V, int)

Leaf.replaceElement(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
  newLeaf <- new Leaf( btree, revision, nbElems );
  copy keys[0..nbElems] to newLeaf.keys[0..nbElems]
  copy values[0..nbElems] to newLeaf.values[0..nbElems]
  oldValue <- newLeaf.values[index]
  newLeaf.values[index] <- value
  newLeaf.prev <- this.prev
  newLeaf.next <- this.next

  InsertResult<K, V> result <- new ModifyResult()
  result.modifiedValue = oldValue
  result.modifiedPage = newLeaf

  return result



This helper class is used to manage the case 2 :

InsertResult<K, V> addElement(long, K, V, int)

Leaf.addElement(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
  newLeaf <- new Leaf( btree, revision, nbElems + 1 );

  copy keys[0..pos] to newLeaf.keys[0..pos]
  newLeaf.keys[pos] <- key
  copy keys[pos..nbElems] to newLeaf.keys[pos+1..nbElems+1]

  copy values[0..pos] to newLeaf.values[0..pos]
  newLeaf.values[pos] <- value
  copy values[pos..nbElems] to newLeaf.values[pos+1..nbElems+1]

  newLeaf.prev <- this.prev
  newLeaf.next <- this.next

  InsertResult<K, V> result <- new ModifyResult()
  result.modifiedValue = null
  result.modifiedPage = newLeaf

  return result




This helper method is used to create the split pages : 

InsertResult<K, V> addAndSplit( long, K, V, int )

Leaf.addAndSplit(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
  middle <- nbPageSize / 2
  leftPage <- null
  rightPage <- null

  if pos < middle
    then
      // Create the left page
      leftPage <- new Leaf(btree, revision, middle + 1)

      copy keys[0..pos] to leftPage.keys[0..pos]
      leftPage.keys[pos] <- key
      copy keys[pos..middle] to leftPage.keys[pos+1..middle+1]

      copy values[0..pos] to leftPage.values[0..pos]
      leftPage.values[pos] <- value
      copy values[pos..middle] to leftPage.values[pos+1..middle+1]

      // create the right page
      rightPage <- new Leaf(btree, revision, middle)

      copy keys[middle..nbElems] to rightPage.keys[0..middle]
      copy values[middle..nbElems] to rightPage.values[0..middle]
    else
      // create the left page
      leftPage <- new Leaf(btree, revision, middle)

      copy keys[0..middle] to leftPage.keys[0..middle]
      copy values[0..middle] to leftPage.values[0..middle]

      // Create the right page
      rightPage <- new Leaf(btree, revision, middle + 1)

      copy keys[middle..pos] to rightPage.keys[0..pos]
      rightPage.keys[pos] <- key
      copy keys[pos..nbElems] to rightPage.keys[pos+1..middle+1]

      copy values[middle..pos] to rightPage.values[0..pos]
      rightPage.values[pos] <- value
      copy values[pos..nbElems] to rightPage.values[pos+1..middle+1]

  // Update the links
  leftPage.prev <- this.prev
  leftPage.next <- rightPag
  rightPage.prev <- leftPage
  rightPage.next <- this.next

  // Get the new pivot from the right page
  newPivot <- rightPage.keys[0]

  // create the result
  InsertResult<K, V> result <- new SplitResult()
  result.newPivot <- newPivot
  result.leftPage <- leftPage
  result.rightPage <- rightPage

  return result



AbstractPage.findPos

This method is used to find the position of a key in the keys. If the key is found, we will return its position as a negative value. Otherwise, we will return the position of the key which is immediately above the key we are looking for, or the latest position in the page if the new key is higher than any existing keys.

We try to make the minimal number of comparisons while searching for the key, as it's a costly operation. If we have a tree with 1 million of elements, and pages containing 16 elements, with pages 66% full, then we will have to go down 6 levels, with around 4 comparisons per page, so around 24 comparisons for each find.

int findPos(K)

Page.findPos(key) : (K) -> int
  min <- 0
  max <- nbElems - 1

  while max > min
    do
      middle <- ( max + min + 1) / 2

      if key == keys[middle]
        then
          return - (middle + 1)

      if key > keys[middle]
        then
          min <- middle + 1
        else
          max <- middle - 1
    done

  if key < keys[max]
    then 
      return max
    else 
      if key == keys[max]
        then 
          return - (max + 1)
        else 
          return max + 1



There is a small improvment done in this algorithm : as we may find the keys before we have gone down to atomic elements, we can save some comparaison by excluding the element we just compared.
    
        
        
            Change Notification Preferences
        
        View Online
        |
        View Changes
                |
        Add Comment
            







---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
 ...
       if nbElems < BTree.pageSize 
        then // The page is not full, case 2
 return insertElement(revision, addElement(revision, key, value, pos)
         else // The page is full, case 3
 return page.splitAndAdd(revision, page.addAndSplit(revision, key, value, pos)
 {code}
 ...
 This helper class is used to manage the case 2 :
 {code:title=InsertResult<K, V> insertElement(long, addElement(long, K, V, int)|borderStyle=solid}
 Leaf.insertElement(revision, Leaf.addElement(revision, key, value, pos) : (long, K, V, int) -> InsertResult<K, V>
   newLeaf <- new Leaf( btree, revision, nbElems + 1 );
 ...

[CONF] Apache Labs > Algorithms

Algorithms

Page edited by Emmanuel Lécharny

Changes (4)

Full Content

BTree methods

Insert

Delete

Find

Browse

Page methods

Node.insert

Leaf.insert

AbstractPage.findPos

Reply via email to