Re: [Mavibot] Crash recovery system proposed algorithm

Emmanuel Lécharny Sat, 16 Mar 2013 20:01:10 -0700

Le 3/16/13 7:59 PM, sebb a écrit :
> On 14 March 2013 08:56, Emmanuel Lécharny <elecha...@gmail.com> wrote:
>> Le 3/13/13 4:39 PM, Emmanuel Lécharny a écrit :
>>> One small update, as I have made a mistake in my initial mail :
>>>
>>>
>>> It's not implemented atm, will work on that.
>>>
>>> Any better idea ?
>> I rethought about the proposal this morning, and found it over complex.
>>
>> A better idea is to store an offset to the BTree headers in a list of
>> BTree offsets, at the beginning of the file. If this list of offset
>> can't be stored in a single page, we will use a new page to store the
>> overflowing offsets. Adding or removing a BTree will just be a matter of
>> adding or removing an offset from this list (which might require a
>> rewrite of those pages.
>> Thoughts ?
> Seems to me that the currently proposed solutions all depend on the
> disk blocks being updated in a specific sequence.


true. The alternative is to use a journal, that stores the pending
operations, and which is flushed on a timely fashion - and applied when
we recover from a crash. We can also force the data to be written on
disk, using file.getFD().sync(), on every modification, but this is
extremelly costly.

We use a journal for the persisted BTree (ie, a BTree in memory backed
on disk).

Right now, I don't 'force' the data to be written on disk - ie, if the
system crashes, you may lost something - as I'm focusing on getting the
data to be stored correctly when everything is fine. This is oubviously
something that needs to be improved. However, the critical part is to
guarantee that we always point to the correct data when we start the DB.
That requires we always flush the new versions before flushing the BTRee
header, which revers to this new version.



> Depending on the hardware/OS/language being used, AFAIK this may not
> be possible to enforce.
>
> May I suggest that any assumptions about the behaviour of the host
> disk system should be clearly documented?
Right now, there is no assumption made. In the near future, once all the
basic operations will work fine, then we will have to think about those
assumptions,a nd add the code to allow a recovering in case of a crash.

Do you have any proposal, ideas, or suggestion ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


---------------------------------------------------------------------
To unsubscribe, e-mail: labs-unsubscr...@labs.apache.org
For additional commands, e-mail: labs-h...@labs.apache.org

Re: [Mavibot] Crash recovery system proposed algorithm

Reply via email to