Re: Derby architecture/design documents

Mike Matrigali 24 Jan 2005 00:52:28 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As you are seeing it is hard to keep to one topic as they are all
related.  I didn't want to get into the space stuff, but since
you asked, here are some comment directed at those:

In both btree and heap rows are only marked deleted and then sometime
later the space is reclaimed.  Here is what triggers that work.

"online" during BTREE split:
~    Whenever there is not enough room on a leaf to do an insert the
~    code attempts to find space on the leaf, by seeing if it can reclaim
~    any committed deletes on that leaf.  That work only requires the
~    latch on the leaf and row locks.  It is expected that most of the
~    space reclaim done in the btree goes through this path.  This is
~    the code you see in BTreeController.reclaim_deleted_rows.

BTREE post commit work:
~    Whenever a delete operation deletes the last row from a leaf
~    page then a
~    BtreePostCommit job is queued to be executed after the transaction
~    which did the delete commits.  This work currently requires a
~    table level lock as page merges have not been implemented to be
~    allowed concurrent with other operations.  I believe many db's don't
~    even try to do page merges except when called from some sort of
~    reorg utility.  If all rows on page are purged, then it will move
~    to the free list and perform a merge to the tree.

~    It is expected that the normal space reclamation happens with row
~    locks during btree split, which is why not much work has been done
~    to optimize btree post commit path.

Heap post commit work:
~    When a delete operation deletes the last row from any heap page then
~    a HeapPostCommit job is queued to be executed after the transaction
~    which did the delete commits.  This work requires latch on the
~    page being processed and row locks on rows being deleted.  If all
~    rows are purged then page is put on a free list, if not then page
~    is marked as having more space for future inserts.

Dibyendu Majumdar wrote:
| Mike,
|
| My initial comments/questions follow, more to come ...
|
|
|>Btree's are based on ARIES.
|
|
| Do you mean that BTree implementation uses ARIES style logging and
| recovery, or does it implement the concurrency and recovery logic of
| ARIES/IM? Or both?
|
| Reading the code, I cannot see that ARIES/IM is used apart from the
| data row locking.
|
|
|>Structure modifications are all
|>isolated from other operations through the use of latches.
|
|
| I am trying to understand the splitting logic which requires structural
| change. It seems to me that it works as follows:
|
| If a key cannot be inserted in a page, then check if parent has enough
space
| to insert the split key.
| If not, go to root and split the tree top-down along the path that
leads to
| the page that the new key is being inserted at.
| Retry inserting the key.
|
| The logic repeats until the key is inserted.
|
| The basic idea seems to be that the split is deferred until it is ensured
| that space is available in the parent page.
| The exact mechanism is not clear to me as I am unable to follow the logic
| which uses recursion, and seems quite complex.
|
| The split seems to be managed as an "internal transaction" which is
| described as follows:
|
| "Start an internal transaction.  An internal transaction is a completely
| separate transaction from the current user transaction.  All work done
| in the internal transaction must be physical (ie. it can be undone
|  physically by the rawstore at the page level, rather than logically
|  undone like btree insert/delete operations).  The rawstore guarantee's
| that in the case of a system failure all open Internal transactions are
| first undone in reverse order, and then other transactions are undone
| in reverse order.
| Internal transactions are meant to implement operations which, if
|  interupted before completion will cause logical operations like tree
| searches to fail.  This special undo order insures that the state of
| the tree is restored to a consistent state before any logical undo
| operation which may need to search the tree is performed."
|
| FYI, I am looking at the following methods:
|
|
org.apache.derby.impl.store.access.btree.BTreeController.start_xact_and_dosp
| lit()
|
org.apache.derby.impl.store.access.btree.BranchControlRow.restartSplitFor()
| org.apache.derby.impl.store.access.btree.BranchControlRow.splitFor()
| org.apache.derby.impl.store.access.btree.LeafControlRow.splitFor()
|
|
|>In both the btree and the heap, delete are first executed by marking
|>a "deleted" bit.  This is to insure space on the page for abort, since
|>row level locking will allow other rows on the page to be modified
|>conncurrently with transaction executing the delete.  The system uses
|>a background daemon to schedule work after commit to reclaim the space
|>of the deleted rows.  A row marked deleted can be "purged" if one can
|>obtain a lock on it (if it was an uncommitted delete then the
|>transaction doing the commit would still have an exclusive lock on the
|>row).
|
|
| I assume you are referring to:
|
| org.apache.derby.impl.store.access.btree.BTreePostCommit.
|
| The comment in the code says:
|
| "The current space reclamation algorithm requires a table level
| lock on the btree - this is mostly because the shrink algorithm
| is not multi-user.  This lock is requested NOWAIT as it does
| not want to impedede normal operation on the table.  If the lock
| were to wait then the current lock manager livelock algorithm
| would block all subsequent lock requests on this btree even if
| they are compatible with the current holder of the lock."
|
| But I also find that deleted rows are also reclaimed during normal insert
| operations. For example, in:
|
|
org.apache.derby.impl.store.access.btree.BTreeController.reclaim_deleted_row
| s().
|
| The comment for this method says:
|
| "Attempt to reclaim committed deleted rows from the page.
| Get exclusive latch on page, and then loop backward through
| page searching for deleted rows which are committed.  The routine
| assumes that it is called from a transaction which cannot have
| deleted any rows on the page.  For each deleted row on the page
| it attempts to get an exclusive lock on the deleted row, NOWAIT.
| If it succeeds, and since this row did not delete the row then the
| row must have been deleted by a transaction which has committed, so
| it is safe to purge the row.  It then purges the row from the page.
| Note that this routine may remove all rows from the page, it will not
| attempt a merge in this situation.  This is because this routine is
| called from split which is attempting an insert on the given page, so
| it would be a waste to merge the page only to split it again to allow
| the insert of the row causing the split."
|
|
| Regards
|
| Dibyendu
|
|
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB9EaGEpeslyHqPs0RArj9AKCG5pVPY1U9yIvaB7f6vZSfXM9lxwCeODHt
uTYEODNS05NE0N/9tMpudXg=
=Niv+
-----END PGP SIGNATURE-----

Re: Derby architecture/design documents

Reply via email to