-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As you are seeing it is hard to keep to one topic as they are all related. I didn't want to get into the space stuff, but since you asked, here are some comment directed at those:
In both btree and heap rows are only marked deleted and then sometime later the space is reclaimed. Here is what triggers that work.
"online" during BTREE split: ~ Whenever there is not enough room on a leaf to do an insert the ~ code attempts to find space on the leaf, by seeing if it can reclaim ~ any committed deletes on that leaf. That work only requires the ~ latch on the leaf and row locks. It is expected that most of the ~ space reclaim done in the btree goes through this path. This is ~ the code you see in BTreeController.reclaim_deleted_rows.
BTREE post commit work: ~ Whenever a delete operation deletes the last row from a leaf ~ page then a ~ BtreePostCommit job is queued to be executed after the transaction ~ which did the delete commits. This work currently requires a ~ table level lock as page merges have not been implemented to be ~ allowed concurrent with other operations. I believe many db's don't ~ even try to do page merges except when called from some sort of ~ reorg utility. If all rows on page are purged, then it will move ~ to the free list and perform a merge to the tree.
~ It is expected that the normal space reclamation happens with row ~ locks during btree split, which is why not much work has been done ~ to optimize btree post commit path.
Heap post commit work: ~ When a delete operation deletes the last row from any heap page then ~ a HeapPostCommit job is queued to be executed after the transaction ~ which did the delete commits. This work requires latch on the ~ page being processed and row locks on rows being deleted. If all ~ rows are purged then page is put on a free list, if not then page ~ is marked as having more space for future inserts.
Dibyendu Majumdar wrote: | Mike, | | My initial comments/questions follow, more to come ... | | |>Btree's are based on ARIES. | | | Do you mean that BTree implementation uses ARIES style logging and | recovery, or does it implement the concurrency and recovery logic of | ARIES/IM? Or both? | | Reading the code, I cannot see that ARIES/IM is used apart from the | data row locking. | | |>Structure modifications are all |>isolated from other operations through the use of latches. | | | I am trying to understand the splitting logic which requires structural | change. It seems to me that it works as follows: | | If a key cannot be inserted in a page, then check if parent has enough space | to insert the split key. | If not, go to root and split the tree top-down along the path that leads to | the page that the new key is being inserted at. | Retry inserting the key. | | The logic repeats until the key is inserted. | | The basic idea seems to be that the split is deferred until it is ensured | that space is available in the parent page. | The exact mechanism is not clear to me as I am unable to follow the logic | which uses recursion, and seems quite complex. | | The split seems to be managed as an "internal transaction" which is | described as follows: | | "Start an internal transaction. An internal transaction is a completely | separate transaction from the current user transaction. All work done | in the internal transaction must be physical (ie. it can be undone | physically by the rawstore at the page level, rather than logically | undone like btree insert/delete operations). The rawstore guarantee's | that in the case of a system failure all open Internal transactions are | first undone in reverse order, and then other transactions are undone | in reverse order. | Internal transactions are meant to implement operations which, if | interupted before completion will cause logical operations like tree | searches to fail. This special undo order insures that the state of | the tree is restored to a consistent state before any logical undo | operation which may need to search the tree is performed." | | FYI, I am looking at the following methods: | | org.apache.derby.impl.store.access.btree.BTreeController.start_xact_and_dosp | lit() | org.apache.derby.impl.store.access.btree.BranchControlRow.restartSplitFor() | org.apache.derby.impl.store.access.btree.BranchControlRow.splitFor() | org.apache.derby.impl.store.access.btree.LeafControlRow.splitFor() | | |>In both the btree and the heap, delete are first executed by marking |>a "deleted" bit. This is to insure space on the page for abort, since |>row level locking will allow other rows on the page to be modified |>conncurrently with transaction executing the delete. The system uses |>a background daemon to schedule work after commit to reclaim the space |>of the deleted rows. A row marked deleted can be "purged" if one can |>obtain a lock on it (if it was an uncommitted delete then the |>transaction doing the commit would still have an exclusive lock on the |>row). | | | I assume you are referring to: | | org.apache.derby.impl.store.access.btree.BTreePostCommit. | | The comment in the code says: | | "The current space reclamation algorithm requires a table level | lock on the btree - this is mostly because the shrink algorithm | is not multi-user. This lock is requested NOWAIT as it does | not want to impedede normal operation on the table. If the lock | were to wait then the current lock manager livelock algorithm | would block all subsequent lock requests on this btree even if | they are compatible with the current holder of the lock." | | But I also find that deleted rows are also reclaimed during normal insert | operations. For example, in: | | org.apache.derby.impl.store.access.btree.BTreeController.reclaim_deleted_row | s(). | | The comment for this method says: | | "Attempt to reclaim committed deleted rows from the page. | Get exclusive latch on page, and then loop backward through | page searching for deleted rows which are committed. The routine | assumes that it is called from a transaction which cannot have | deleted any rows on the page. For each deleted row on the page | it attempts to get an exclusive lock on the deleted row, NOWAIT. | If it succeeds, and since this row did not delete the row then the | row must have been deleted by a transaction which has committed, so | it is safe to purge the row. It then purges the row from the page. | Note that this routine may remove all rows from the page, it will not | attempt a merge in this situation. This is because this routine is | called from split which is attempting an insert on the given page, so | it would be a waste to merge the page only to split it again to allow | the insert of the row causing the split." | | | Regards | | Dibyendu | | | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFB9EaGEpeslyHqPs0RArj9AKCG5pVPY1U9yIvaB7f6vZSfXM9lxwCeODHt uTYEODNS05NE0N/9tMpudXg= =Niv+ -----END PGP SIGNATURE-----