nickva opened a new pull request #721: Save migrated replicator checkpoint 
documents immediately
URL: https://github.com/apache/couchdb/pull/721
 
 
   Previously, if the replication id algorithm was updated, replicator would
   migrate checkpoint documents but keep them in memory. They would be written 
to
   their respective databases only if checkpoints need to be updated, which
   doesn't happen unless the source database changes. As a result it was 
possible
   for checkpoints to be lost. Here is how it could happen:
   
   1. Checkpoints were created for current (3) version of the replicator 
document.
   Assume the replication document contains some credentials tha look like
   'adm:pass', and the commputed v3 replication id is "3abc...".
   
   2. Replication id algorithm is updated to version 4. Version 4 ignores
   passwords, such that changing authentication from 'adm:pass' to 'adm:pass2'
   would not change the replication ids.
   
   3. Server code is updated with version 4. Replicator looks for checkpoints 
with
   the new version 4, which it calculates to be "4def...". It can't find it, so 
it
   looks for v3, it finds "3abc..." and decides to migrate it. However migration
   only happens in memory. That is, the checkpoint document is updated but it
   need a checkpoint to happen for it to be written to disk.
   
   4. There are no changes to the source db. So no checkpoints are forced to
   happen.
   
   5. User hears that the new replicator version is improved and passwords
   shouldn't alter the replication ids and all the checkpoints are reused. They
   update the replication document with their new credentials - adm:pass2.
   
   6. The updated document with 'adm:pass2' credentials is processed by the
   replicator. It computes the v4 replication id - "4def...". It's the same as
   before since it wasn't affected by pass -> pass2 change. That replication
   checkpoint document is not found on neither source not target. Replicator 
then
   computes v3 of the id to find the older version. However, v3 is affected by 
the
   passwords, so there it computes "3ghi..." which is different from previous v3
   which was "3abc..." It cannot find it. Computes v2 and checks, then v1, and
   eventually gives up not finding checkpoint and restart the change feed from 0
   again.
   
   To fix it, update `find_replication_logs` to also write the migrated
   replication checkpoint documents to their respective databases as soon as it
   finds them.
   
   Related to issue #689 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to