Hi,

I was looking for a better algorithm to manage the pages that can be
freed when some revisions are not used anymore. Atm, the algorithm was
to keep a track of each modified pages and to associate them with a
revision that copied them. When this revision get obsolote, we can
safely remove the copied pages. Well, I thought it was enough, but this
is not the case. It works only if we keep a track of the list of
revisions that were removed, and only if we remove the revision that is
on the left of the list (that eans we can safely remove any page copid
by a revision which is not the oldest revision).

There is another solution though. We just have to use a BTree containing
the page numbers as a key, associated with the revision that created the
page as a value. Every time a revision is obsoleted, we can safely check
the BTree, searching for pages copied by this revision, and removing the
page from the btree, moving it to the free page BTree.

For instance, let's consider the following scenario, where revisins are
noted with capital letters, and pages with numbers.
   added  copied
A  1      /
B  2,3    1->2 (1 becomes 2)
C  4,5    2->4
D  6,7    3->7,4->6
E  8,9    5->9,6->8

The BTree will contains the following Key/values, if we never remove
anything :

1 -> A
2 -> B
3 -> B
4 -> C
5 -> C
6 -> D
7 -> D
8 -> E
9 -> E

Suppose we want to keep B and E revisions only.
- When we add A, the BTree looks like :
1 -> A

- When B is created, as we copied page 1, and as revision A is not kept,
we can safely remove page 1 from the BTree and move it to the free pages
BTree.
The BTree looks lie :

[1 -> A] removed
2 -> B
3 -> B

- When C is created, as B is kept, we can't remove the <2, B> element
The bTree looks like :

2 -> B
3 -> B
4 -> C
5 -> C

- When D is created, as B is kept, we can't remove the <3, B> element,
but we can remove the <4, C> element.
The BTree looks like :

2 -> B
3 -> B
[4 -> C] removed
5 -> C
6 -> D
7 -> D

- When E is created, we can remove the 5 and 6 pages, as they were
created by revisions that are'nt kept.
The BTree looks like :


2 -> B
3 -> B
[5 -> C] removed
[6 -> D] removed
7 -> D
8 -> E
9 -> E

- If we remove the B revision now, we can remove the page it created (2
and 3). In order to do that, we need to keep a track of the pages
created by a specific revision.

All in all, we have to maintaing 3 additional BTrees :
- a first BTree containing the list of pages copied by a revision
- a second BTree containing the revision associated with a specific page
- a third one containing the list of pages created by a revision.

IMO, there is one spurious BTree, and we should be able to deduce it
from the two others, but I still have to come with a good algorithm for
that.

If anyone has a better idea, I'll be really pleased !


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


---------------------------------------------------------------------
To unsubscribe, e-mail: labs-unsubscr...@labs.apache.org
For additional commands, e-mail: labs-h...@labs.apache.org

Reply via email to