Re: [Mavibot] Crash recovery system proposed algorithm

Kiran Ayyagari Wed, 13 Mar 2013 02:48:01 -0700

On Wed, Mar 13, 2013 at 2:55 PM, Emmanuel Lécharny <elecha...@gmail.com>wrote:


> Le 3/12/13 6:34 PM, Kiran Ayyagari a écrit :
> > On Tue, Mar 12, 2013 at 10:23 PM, Emmanuel Lécharny <elecha...@gmail.com
> >wrote:
> >
> >
> > 3) if the two revisions are different, that means we had a crash: we
> >> will have to read all the pages, and discard the pending N+1 revision
> >> pages.
> >>
> >> I would suggest that we keep the revisions N-1 and N
> > and during each update the N-1 will be replaced with N and N with N+1
> > as long as the difference between revisions is 1 we can assume that there
> > was no crash.
> > In case of a crash we can start with the existing N-1 revision as the
> base
> > to recover
>
> If we assume that revision N was ok, we can always recover from it. The
> pending N+1 revision just means we are not clean with the new revision.
>
> The status will be :
>
> T0 : N and N
> T1 (starting with a new revision : N and N+1
> T2 (done with the new revision) : N+1 and N+1 -> back to a stable state
>
> I have a little trouble to map what you propose on a time line :
>
> T0 : N-1 and N
> T1 (starting with a new revision) : N-1, N and N+1 (is this correct ?)
> T2 (done with the new revision) : N and N+1
>
> the time line would be

T0 : N-1 and N
T1 (starting with a new revision) : N and NULL (replace N-1 with N and make
the current revision as NULL, cause it is ongoing)
T2 (done with the new revision) : N and N+1 (update the current version to
N+1 _after_ updating the BTree)

> Detecting that we had a failure in this cas would imply we have N-1 and
> N+1, but what will keeping N good for ?
>
> Regarding my proposal : we will have to update the BTree header twice :
> once when we start the modification, to add N+1 revision, and once at
> the end to remove N and replace it with N+1. This is costly. Assuming
> that those two elements are 2 longs, which will be stored on the same
> page, or two pages at worst (if we have many BTrees and a BTRee header
> span accross two physical pages), I'm not sure we can't simply update
> the new status only once, at the end of the BTree update.
>
> I propose that we should update the header only after updating the Btree,
this way if for _some_ reason Btree update
fails we don't end up with a header pointing to this incorrect/non-existing
revision

> Another thing : the fact that BTree headers might span across two pages
> is really annoying : we can have a crash after having updated the first
> page but before updating the second one, leading to inconsistencies. An
> option would be that each BTree header is stored in one single page, so
> that we always store those informations in one single page.
> >
>
+1 for header per page (we can even make the header page a little larger
than the other pages if needed, not sure
how it is going to impact the existing code, this is just an idea)

> >
> >> Reclaiming the pending pages is a matter of reading *all* the pages, and
> >> for each page that is not linked to another one, we can safely move them
> >> to the ist of free pages. As this is a expensive task, which requires a
> >> lot of memory if we have lots of pages, we may also create a new file
> >> containing only the latest valid revision.
> >>
> >> we should keep this as a background task
> Sure, we can do that in order to avoid blocking the server for minutes
> at startup. That's a smart idea !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: labs-unsubscr...@labs.apache.org
> For additional commands, e-mail: labs-h...@labs.apache.org
>
>


-- 
Kiran Ayyagari
http://keydap.com

Re: [Mavibot] Crash recovery system proposed algorithm

Reply via email to