Re: ensuring an update_seq is used at most once

Adam Kocoloski Mon, 12 Apr 2010 06:02:08 -0700

Yep.  Db#db.update_seq is not always the same as Db#db.committed_update_seq 
when delayed_commits are on.


Adam

On Apr 12, 2010, at 8:54 AM, Paul Davis wrote:

> An idle curiosity, is it ever possible to replicate something that has
> been written to disk before a header is flushed?
> 
> 
> On Mon, Apr 12, 2010 at 8:46 AM, Adam Kocoloski <[email protected]> wrote:
>> Yep, your analysis is dead-on, and is a more complete solution than what I 
>> propose.  Best,
>> 
>> Adam
>> 
>> On Apr 12, 2010, at 4:51 AM, Robert Newson wrote:
>> 
>>> Would it be safer to have a low- and high- watermark for the
>>> update_seq in memory? What I mean is that the db writer will never
>>> write out an update_seq that is N higher than the last committed one;
>>> if it is forced to do so, to permit a write, it then fsync's and
>>> resets high_seq to last_committed_seq. This way you can genuinely
>>> ensure that you don't reuse an update_seq. In practice we could allow
>>> a large delta, one that is larger than the number of fsyncs we expect
>>> to manage in the commit interval.
>>> 
>>> Your idea to just bump the update_seq "significantly" mostly pans out
>>> (I know a system that does precisely this) but it would be a data loss
>>> scenario if when it doesn't pan out.
>>> 
>>> B.
>>> 
>>> On Mon, Apr 12, 2010 at 3:54 AM, Adam Kocoloski
>>> <[email protected]> wrote:
>>>> Currently a DB update_seq can be reused if there's a power failure before 
>>>> the header is sync'ed to disk.  This adds some extra complexity and 
>>>> overhead to the replicator, which must confirm before saving a checkpoint 
>>>> that the source update_seq it is recording will not be reused later.  It 
>>>> does this by issuing an ensure_full_commit call to the source DB, which 
>>>> may be a pretty expensive operation if the source has a constant write 
>>>> load.
>>>> 
>>>> Should we try to fix that?  One way to do so would be start at a 
>>>> significantly higher update_seq than the committed one whenever the DB is 
>>>> opened after an "unclean" shutdown; that is, one where the DB header is 
>>>> not the last term stored in the file.  Although, I suppose that's not an 
>>>> ironclad test for data loss -- it might be the case that none of the lost 
>>>> updates were written to the file.  I suppose we could "bump" the 
>>>> update_seq on every startup.
>>>> 
>>>> Adam
>>>> 
>>>> 
>> 
>>

Re: ensuring an update_seq is used at most once

Reply via email to