Nick Vatamaniuc created COUCHDB-2965:
----------------------------------------

             Summary: Race condition in replicator rescan logic
                 Key: COUCHDB-2965
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2965
             Project: CouchDB
          Issue Type: Bug
          Components: Replication
            Reporter: Nick Vatamaniuc


There is race condition between the full rescan and regular change feed 
processing in the couch_replicator_manger code.

This race condition would lead to replication docs left in untriggered state 
when a rescan of all the docs is performed. The rescan might happen when nodes 
connect and disconnect. The likelihood of this race condition appear goes up if 
a lot of documents are updated and there is a back-up of messages in the 
replicator manager's mailbox.

The race condition happens in the following way:

* A full rescan is initiated here:

https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424

It clears the db_to_seq ets table which holds the latest change sequence for 
each replicator database. Then launches a scan_all_dbs process.

 * scan_all_dbs will find all replicator-looking-like database and for each 
send a {resume_scan, DbName} message to the main couch_replicator_manager 
process.

 * {resume_scan, DbName} message is handled here:

https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233

The expectation is because db_to_seq was reset it ends up not finding a 
sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, 
which will rescan all documents (since we need to determine ownership for them).

But the race condition occurs because when change feeds stop, they call  
replicator manager with {rep_db_checkpoint, DbName} message. That will update 
db_to_seq ets table with the latest change sequence.

https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225

Which means this sequence of operations could happen:

 * db_to_seq is reset to 0, scan_all_dbs is spawned

 * change feed stops at sequence 1042, it calls {rep_db_checkpoint, 
<<"_replicator">>}

 * {rep_db_checkpoint, <<"_replicator">>} call is handled, now latest db_to_seq 
for _replicator is 1042

 * {resume, <<"_replicator">>} is sent from scan_all_dbs process

 * {resume, <<"_replicator">>} is received by replicator manager. It sees that 
db_to_seq has _replicator with latest sequence 1042, so it will either start 
from that instead of 0, thus skipping updates from 0 to 1042.

This was seen by running the experiment with1000 replication documents were 
being updated. Around document 700 or so , node1 was killed (pkill -f node1) . 
node2 experienced the race condition on rescan and never picked up a bunch of 
document that should have belong to it. didn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to