amaltaro opened a new issue, #4078:
URL: https://github.com/apache/couchdb/issues/4078

   ## Description
   I am bringing this discussion from the CouchDB slack channel and it's still 
to be decided whether it's really a bug.
   
   So, I started a database replication of a database that has ~3M of deleted 
documents and only ~60k non_deleted ones (the database reports around 37M 
update_seq). The replication document is defined with a javascript filter 
function which will skip deleted documents.
   Checking the logs of couchdb, I noticed that the replication was failing and 
starting over and over. Error reported in the log is something like:
   ```
   Replicator, request GET to 
"source_url_db/_changes?filter=ddoc_with_filter&feed=continuous&style=all_docs&since=0&timeout=100000"
 failed due to error req_timedout
   ```
   
   My understand is that it's actually timing out because there was no 
documents passing the filter during that 100secs timeout, thus failing the 
_changes request.
   
   ## Steps to Reproduce
   Execute a replication task for a source database where the vast majority of 
documents are deleted.
   1. Create a design document with filter function in the source node:
   ```
   curl -ks -X POST http://URL:5984/source_db/_design/WorkQueue -H 
'Content-Type:application/json' -d '{"filters": {"filterDeletedDocs": 
function(doc, req) {return !doc._deleted}}}'
   ```
   2. in the target node (running replicator), we create a replication document:
   ```
   curl -ks -X POST http://URL:5984/_replicator -H 
'Content-Type:application/json' -d '{"_id":"rep_wq", 
"source":"https://URL/couchdb/source_db/";, 
"target":"http://URL:5984/target_db/";, "continuous": true, 
"filter":"WorkQueue/filterDeletedDocs"}'
   ```
   
   ## Expected Behaviour
   I would expect the replication not to timeout in case it takes longer than 
the timeout defined in the replication (or in the replicator configuration) to 
find documents matching the replication filter. In other words, if the source 
CouchDB instance is doing heavy readings to replicate data, it should be 
considered as normal and no timeout should be raised.
   
   ## Your Environment
   Source database is running CouchDB 1.6.1 (hopefully soon to be upgraded!)
   Target database (running replicator) is running CouchDB 3.1.2:
   ```
   
{"couchdb":"Welcome","version":"3.1.2","git_sha":"572b68e72","uuid":"7890c19e733322170739ab2520eb4ab1","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
 Apache Software Foundation"}}
   ```
   
   The replicator/target configuration is configured with:
   ```
   [replicator]
   use_checkpoints = true
   checkpoint_interval = 120000
   worker_processes = 4
   http_connections = 10
   worker_batch_size = 2000
   socket_options = [{keepalive, true}, {nodelay, true}]
   max_replication_retry_count = infinity
   connection_timeout = 900000
   ```
   
   I also tested replication with the following changes (with no success):
   ```
   [chttpd]
   changes_timeout = 300000
   ```
   
   Lastly, here is a snapshot of the database I was trying to replicate from:
   ```
   
{"db_name":"workqueue","doc_count":61306,"doc_del_count":3345978,"update_seq":37812510,"purge_seq":0,"compact_running":false,"disk_size":8065118340,"data_size":2165447890,"instance_start_time":"1655923205137624","disk_format_version":6,"committed_update_seq":37812510}
   ```
   
   Regarding the OS, it's CentOS Linux 7 (x86_64)
   
   ## Additional Context
   The _changes timeout is followed with this error:
   ```
   [error] 2022-06-16T22:27:59.320726Z [email protected] <0.30721.1> -------- 
ChangesReader process died with reason: 
{changes_reader_died,{timeout,ibrowse_stream_cleanup}}
   ```
   which then restarts the replication job.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to