[ 
https://issues.apache.org/jira/browse/COUCHDB-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002195#comment-13002195
 ] 

Randall Leeds commented on COUCHDB-1080:
----------------------------------------

This probably qualifies as a bug fix, but if backporting it to the old 
replicator is a pain don't worry about it. We should get 1.1 out the door so we 
can branch 1.2 before long or this development cycle is going to get long again 
and there'll be temptation to creep in more features.

> fail fast with checkpoint conflicts
> -----------------------------------
>
>                 Key: COUCHDB-1080
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1080
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 1.0.2
>            Reporter: Randall Leeds
>             Fix For: 1.1, 1.2
>
>         Attachments: COUCHDB-1080-2-fdmanana.patch, 
> COUCHDB-1080-3-fdmanana.patch, COUCHDB-1080-fdmanana.patch, 
> paranoid_checkpoint_failure.patch, paranoid_checkpoint_failure_v2.patch
>
>
> I've thought about this long and hard and probably should have submitted the 
> bug a long time ago. I've also run this in production for months.
> When a checkpoint conflict occurs it is almost always the right thing to do 
> to abort.
> If there is a rev mismatch it could mean there's are two conflicting 
> (continuous and one-shot) replications between the same hosts running. 
> Without reloading the history documents checkpoints will continue to fail 
> forever. This could leave us in a state with many replicated changes but no 
> checkpoints.
> Similarly, a successful checkpoint but a lost/timed-out response could cause 
> this situation.
> Since the supervisor will restart the replication anyway, I think it's safer 
> to abort and retry.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to