raulmartinezr opened a new issue #3083:
URL: https://github.com/apache/couchdb/issues/3083
[NOTE]: # ( ^^ Provide a general summary of the issue in the title above. ^^
)
## Description
We use **dockerized couchdb** in local development environment to **test
applications**.
The procedure for each test:
- create a database in couchdb
- test whatever we need to test inside this DB
- delete it in case it's not necessary anymore
We have **aprox 70 tests**
For each database a few documents are created
- **less than 10 documents** usually (max 50 when we test indexes created
by ddoc)
- **2 design documents,** one with two mango indexes and the other one
with 1 mango index
If we execute tests module by module (each module contain between 1 and 10
tests ) there is not any issue, but if we execute all at the same time
(sequentially, without parallelism), then at some point tests hangs
**We observed**
- Couchdb stops writting logs (CPU consumption decreases)
- We still have responses to quieries for some time, 1min aprox
- Then we have not responses from couchdb anymore
**Example** (It's not always the same)
- Last logs written (querying a view with /master/_partition/iam/_find ).
After that CPU consumption of couchdb goes down.
```bash
[debug] 2020-08-17T19:30:05.663402Z nonode@nohost <0.20917.0> a13fe062bc no
record of user admin
[debug] 2020-08-17T19:30:05.663457Z nonode@nohost <0.20917.0> a13fe062bc
timeout 600
[debug] 2020-08-17T19:30:05.663495Z nonode@nohost <0.20917.0> a13fe062bc
Successful cookie auth as: "admin"
[notice] 2020-08-17T19:30:05.665187Z nonode@nohost <0.20917.0> a13fe062bc
127.0.0.1:5984 172.28.0.1 admin POST /master/_partition/iam/_find 200 ok 2
[debug] 2020-08-17T19:30:05.677477Z nonode@nohost <0.20917.0> be296403e4 no
record of user admin
[debug] 2020-08-17T19:30:05.677539Z nonode@nohost <0.20917.0> be296403e4
timeout 600
[debug] 2020-08-17T19:30:05.677570Z nonode@nohost <0.20917.0> be296403e4
Successful cookie auth as: "admin"
```
- Request captured with tcpdump

- After some time (1min 9s), timeouts

**HW and environment**
- Host machine runs Ubuntu 20.04, with 8cores and 16Gb RAM and 512GB SSD
(250free)
- Docker container has not any limitation in CPU/Memory/Space
- Couchdb configured as single node (full configuration below)
This is how the container processes looks. Earlang (beam.smp) is the most
consuming, with peaks of 70% CPU

Theads of beam.smp: Just when issue happens
I monitored threads during the whole process. Just before the crash, it
seems scheduler threads increase the activity
```bash
top - 20:55:05 up 2:58, 0 users, load average: 3.93, 2.31, 2.35
Threads: 46 total, 0 running, 46 sleeping, 0 stopped, 0 zombie
%Cpu(s): 46.2 us, 4.3 sy, 0.0 ni, 49.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0
st
MiB Mem : 15687.8 total, 1947.6 free, 7294.3 used, 6445.9 buff/cache
MiB Swap: 980.0 total, 980.0 free, 0.0 used. 7200.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
54 couchdb 20 0 4568544 62092 10904 S 40.0 0.4 0:07.25
2_scheduler
53 couchdb 20 0 4568544 62092 10904 S 33.3 0.4 0:20.74
1_scheduler
55 couchdb 20 0 4568544 62092 10904 S 33.3 0.4 0:06.63
3_scheduler
56 couchdb 20 0 4568544 62092 10904 S 20.0 0.4 0:05.66
4_scheduler
37 couchdb 20 0 4568544 62092 10904 S 6.7 0.4 0:00.03
async_2
6 couchdb 20 0 4568544 62092 10904 S 0.0 0.4 0:00.03
beam.smp
34 couchdb 20 0 4568544 62092 10904 S 0.0 0.4 0:00.00
sys_sig_dispatc
35 couchdb 20 0 4568544 62092 10904 S 0.0 0.4 0:00.00
sys_msg_dispatc
36 couchdb 20 0 4568544 62092 10904 S 0.0 0.4 0:00.18
async_1
38 couchdb 20 0 4568544 62092 10904 S 0.0 0.4 0:00.14
async_3
.....
```
**Any idea about what could be the cause? Any hint would be appreaciated.**
[NOTE]: # ( Describe the problem you're encountering. )
[TIP]: # ( Do NOT give us access or passwords to your actual CouchDB! )
## Steps to Reproduce
There is not a fixed trigger for the issue.
## Expected Behaviour
We would expect couchdb can handle this load even with docker. It's not
heavy, during tests we have 15operations per second max.
[NOTE]: # ( Tell us what you expected to happen. )
## Your Environment
[TIP]: # ( Include as many relevant details about your environment as
possible. )
[TIP]: # ( You can paste the output of curl http://YOUR-COUCHDB:5984/ here.
)
```json
{"couchdb":"Welcome","version":"3.1.0","git_sha":"ff0feea20","uuid":"b17edbd1de7d1504022d6f359ff9a4f8","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
Apache Software Foundation"}}
```
* CouchDB version used: 3.1.0
* Browser name and version: Not relevant
* Operating system and version: Couchdb [email protected]
## Additional Context
**Full couchdb configuration**
```bash
Configuration Settings:
[admins] admin="******"
[attachments] compressible_types="text/*, application/javascript,
application/json, application/xml"
[attachments] compression_level="8"
[chttpd] backlog="512"
[chttpd] bind_address="any"
[chttpd] max_db_number_for_dbs_info_req="100"
[chttpd] port="5984"
[chttpd] prefer_minimal="Cache-Control, Content-Length, Content-Range,
Content-Type, ETag, Server, Transfer-Encoding, Vary"
[chttpd] require_valid_user="false"
[chttpd] server_options="[{backlog, 512}, {acceptor_pool_size, 64}, {max,
4096}]"
[chttpd] socket_options="[{sndbuf, 262144}, {nodelay, true}]"
[cluster] n="3"
[cluster] q="2"
[cors] credentials="false"
[couch_httpd_auth] allow_persistent_cookies="true"
[couch_httpd_auth] auth_cache_size="50"
[couch_httpd_auth] authentication_db="_users"
[couch_httpd_auth] authentication_redirect="/_utils/session.html"
[couch_httpd_auth] iterations="10"
[couch_httpd_auth] require_valid_user="false"
[couch_httpd_auth] secret="00464db7ba8beb6a5915e4f5dbd03a49"
[couch_httpd_auth] timeout="600"
[couch_peruser] database_prefix="userdb-"
[couch_peruser] delete_dbs="false"
[couch_peruser] enable="false"
[couchdb] attachment_stream_buffer_size="4096"
[couchdb] changes_doc_ids_optimization_threshold="100"
[couchdb] database_dir="./data"
[couchdb] default_engine="couch"
[couchdb] default_security="admin_only"
[couchdb] file_compression="snappy"
[couchdb] max_dbs_open="10000"
[couchdb] max_document_size="8000000"
[couchdb] os_process_timeout="20000"
[couchdb] single_node="true"
[couchdb] users_db_security_editable="false"
[couchdb] uuid="b17edbd1de7d1504022d6f359ff9a4f8"
[couchdb] view_index_dir="./data"
[couchdb_engines] couch="couch_bt_engine"
[csp] enable="true"
[fabric] request_timeout="infinity"
[feature_flags] partitioned||*="true"
[httpd] allow_jsonp="false"
[httpd] authentication_handlers="{couch_httpd_auth,
cookie_authentication_handler}, {couch_httpd_auth,
default_authentication_handler}"
[httpd] bind_address="127.0.0.1"
[httpd] enable_cors="false"
[httpd] enable_xframe_options="false"
[httpd] max_http_request_size="4294967296"
[httpd] port="5986"
[httpd] secure_rewrites="true"
[httpd] socket_options="[{sndbuf, 262144}]"
[indexers] couch_mrview="true"
[ioq] concurrency="10"
[ioq] ratio="0.01"
[ioq.bypass] compaction="false"
[ioq.bypass] os_process="true"
[ioq.bypass] read="true"
[ioq.bypass] shard_sync="false"
[ioq.bypass] view_update="true"
[ioq.bypass] write="true"
[log] level="debug"
[log] writer="stderr"
[query_server_config] os_process_limit="2000"
[query_server_config] os_process_soft_limit="1000"
[query_server_config] reduce_limit="true"
[replicator] connection_timeout="30000"
[replicator] http_connections="20"
[replicator] interval="60000"
[replicator] max_churn="20"
[replicator] max_jobs="500"
[replicator] retries_per_request="5"
[replicator] socket_options="[{keepalive, true}, {nodelay, false}]"
[replicator] ssl_certificate_max_depth="3"
[replicator] startup_jitter="5000"
[replicator] verify_ssl_certificates="false"
[replicator] worker_batch_size="500"
[replicator] worker_processes="4"
[ssl] port="6984"
[uuids] algorithm="sequential"
[uuids] max_count="1000"
[vendor] name="The Apache Software Foundation"
```
Some errors identified in the startup. They seem not relevant for the case.
- `_users db` does not exists (but seems to be created afterwards)
```bash
[error] 2020-08-17T19:18:47.731011Z nonode@nohost emulator -------- Error in
process <0.372.0> with exit value:
{database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",[{file,"src/mem3_shards.erl"},{line,399}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,374}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,403}]},{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,96}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,198}]},{chttpd_auth_cache,listen_for_changes,1,[{file,"src/chttpd_auth_cache.erl"},{line,145}]}]}
```
- I suppose without any effect as long as it's configured as single node
`[couchdb] single_node="true"`
```bash
[error] 2020-08-17T19:18:47.799365Z nonode@nohost <0.457.0> -------- Request
to create N=3 DB but only 1 node(s)
[error] 2020-08-17T19:18:47.812920Z nonode@nohost <0.457.0> -------- Request
to create N=3 DB but only 1 node(s)
```
[TIP]: # ( Add any other context about the problem here. )
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]