Cleaning upPage added by Emmanuel LécharnyIntroductionNow that we have a MVCC B+Tree that stores many revisions, we may want to clean up the old revisions that are not used anymore. The rational is that when a page has been copied, the old page is only useful if someone still holds the old revision. But when it's not the case, we can get rid of this page. We should assume that each revision is associated with a counter, and each time a user wants to do a search on a revision, it increments this counter, and decrement it when it does not use it anymore. Now that this revision's counter is 0, we can get rid of the associated tree. Seams simple, except that it's not that simple. Just because no one is using the revision does not mean that the underlying pages can be disposed. In fact, it's very likely that many pages are still referenced by newer revisions, and then are still valid pages. AlgorithmThe way we proceed is by applying a classical GC algorithm : Mark and Sweep. We start with the higher revision, and we flag all the associated pages as used. Then we do the same thing for all the older revision, marking the associated pages as free if and only if they have not already been marked as used, and if the revision being processed is not also used. When we have marked all the free pages, we can remove them. Ok, well, not that complicated. The only slight problem is that it costs a lot to read all the pages from disk, so we should try to keep the pages structure in memory as much as we can. In this case, it's probably easier to know which page has to be removed or not, assuming that we only keep a data structure that mimic the real BTree, where we just keep the PageID and the pages that reference them. That would work up to a point, as we don't have an infinite memory... FragmentationRemoving useless pages will cause some fragmentation at some point. We should also consider that we may need to reorganize the file from time to time to avoid ending with a very fragmented file. Startup compactionWe can also decide to compact the BTree at startup. This is just a matter of getting the latest revision, and to reconstruct a brand new file on disk (see the Bulk copy of data in Mavibot page)
Change Notification Preferences
View Online
|
Add Comment
|
