[GitHub] [couchdb] nickva commented on issue #4385: replication restart from start after server restart

GitBox Thu, 19 Jan 2023 12:33:36 -0800


nickva commented on issue #4385:
URL: https://github.com/apache/couchdb/issues/4385#issuecomment-1397565505


   In dock.zip I had noticed local.ini wasn't at text file but some kind of a 
binary.
   
   ```
    % cat local.ini
   �E�c'K�cnO�cuxUT
   ```
   
   One thing to pay attention is if the replication ID changes. Based on your 
start_log I see that the checkpoint was *not* found for the catalogues endpoint
   
   ```
   notice] 2023-01-19T13:30:16.056403Z nonode@nohost <0.1150.0> 34af135719 
couch1:5984 192.168.100.227 admin GET /catalogues/ 200 ok 2
   [notice] 2023-01-19T13:30:16.131009Z nonode@nohost <0.1150.0> fb7b32c9cd 
couch1:5984 192.168.100.227 admin GET 
/catalogues/_local/475a01ff4762aae18390232479a85acd 404 ok 16
   [notice] 2023-01-19T13:30:16.133231Z nonode@nohost <0.1150.0> ca52dcf06f 
couch1:5984 192.168.100.227 admin GET 
/catalogues/_local/21d33a0db3bd438bc2fe58ef3e64a1a7 404 ok 2
   [notice] 2023-01-19T13:30:16.171877Z nonode@nohost <0.1150.0> 9adcacce61 
couch1:5984 192.168.100.227 admin GET 
/catalogues/_local/9dea40c7f506f19799311d927b321449 404 ok 38
   [notice] 2023-01-19T13:30:16.173819Z nonode@nohost <0.1150.0> 0ab11d84ae 
couch1:5984 192.168.100.227 admin GET 
/catalogues/_local/294224beca58e21bde9e0a5676df7d05 404 ok 1
   ```
   
   Notice the 404 on the `_local/$replicationid` docs.
   
   Not sure about lotimages_new as the logs start after "Starting 
replication..." already. 
   
   So what may be happening is your replication IDs change inadvertently when 
you update docker configs. If replication IDs change, that means previous 
checkpoints won't be found, and replication will rewind from 0. Now, if your 
source, target and other replication parameters stay the same it's most like 
the the server uuid `[couchdb] uuid = ...` value that's no consistent. The 
setting is described 
[here](https://docs.couchdb.org/en/stable/config/couchdb.html#couchdb/uuid)
   
   Here is the description on the replication ID generation algorithm: 
https://docs.couchdb.org/en/stable/replication/protocol.html#generate-replication-id
   
   If that is not specified, a random value will be generated, and that would 
cause your replications IDs to be random every time if you spin up docker 
containers unless you persist your config or explicitly set `[couchdb] uuid 
...`.
   
   That value doesn't have to be a proper UUID. You could use a hostname or 
some other identifier that uniquely identifies the same "cluster". In addition, 
make sure it's set to the same value on all the nodes in the cluster. If you 
have 3 nodes (couch1, couch2, couch3 ensure uuid is the same).
   
   Checkpoints are persisted on both source and target endpoints (database) in 
`_local/$base_replication_id` docs. Replications will only resume from a 
checkpoint if it can find the checkpoint  on *both* source and target. So in 
your logs you could monitor for those 404s and hopefully finally it should find 
the last one is found (a 200 response). There are usually a few 404s expected 
as we try to load older versions of replication ID since the algorithm to 
generate has evolved at least 4 times. Then monitor if replication ID values 
stay the same or change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [couchdb] nickva commented on issue #4385: replication restart from start after server restart

Reply via email to