schneuwlym opened a new issue #3571:
URL: https://github.com/apache/couchdb/issues/3571
[NOTE]: # ( ^^ Provide a general summary of the issue in the title above. ^^
)
## Description
[NOTE]: # ( Describe the problem you're encountering. )
[TIP]: # ( Do NOT give us access or passwords to your actual CouchDB! )
We have an issue with our CouchDB 3.1.1. We are using the default compaction
configuration and this seems to work fine till the database reaches a certain
amount of documents (~76K). Then the compaction dies and it is no longer able
to finish the task. The compaction is restarted every 2 seconds and it always
dies immediately. Till now, the problem is consistent and I didn't find any way
(except of deleting the database) to fix the issue.
I read some other compaction related issues, but here I only used version
3.1.1. So no upgrade, no migration or something similar.
What I tried so far:
* I tried to remove the compaction files manually and restart CouchDB.
Compaction fails again
* I tried to reboot the node. Compaction fails again
* Reading the issues #3292 and #2941, I tinkered an own version based on
3.1.1 including the following two changes. Compaction still fails.
* fix race condition (#3150)
* add remonitor code to DOWN message (#3144)
* First the slack compactor always failed, then I disabled it, but then the
radio_dbs compactor failed as well.
This is the log, which is repeated every two seconds:
```
[notice] 2021-05-19T14:42:38.848090Z [email protected] <0.460.0> --------
ratio_dbs: adding <<"shards/80000000-ffffffff/directory.1621404274">> to
internal compactor queue with priority 2.100073355455779
[info] 2021-05-19T14:42:38.848533Z [email protected] <0.5146.0> --------
Starting compaction for db "shards/80000000-ffffffff/directory.1621404274" at
40726
[notice] 2021-05-19T14:42:38.848615Z [email protected] <0.460.0> --------
ratio_dbs: Starting compaction for
shards/80000000-ffffffff/directory.1621404274 (priority 2.100073355455779)
[notice] 2021-05-19T14:42:38.849705Z [email protected] <0.460.0> --------
ratio_dbs: Started compaction for shards/80000000-ffffffff/directory.1621404274
[warning] 2021-05-19T14:42:38.893633Z [email protected] <0.460.0> --------
exit for compaction of ["shards/80000000-ffffffff/directory.1621404274"]:
{undef,[{math,ceil,[1.6],[]},{couch_emsort,num_merges,2,[{file,"src/couch_emsort.erl"},{line,366}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,508}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{couch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}
[error] 2021-05-19T14:42:38.894691Z [email protected] emulator --------
Error in process <0.5148.0> on node '[email protected]' with exit value:
{undef,[{math,ceil,[1.6],[]},{couch_emsort,num_merges,2,[{file,"src/couch_emsort.erl"},{line,366}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,508}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{couch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}
[info] 2021-05-19T14:42:38.894453Z [email protected] <0.226.0> -------- db
shards/80000000-ffffffff/directory.1621404274 died with reason
{undef,[{math,ceil,[1.6],[]},{couch_emsort,num_merges,2,[{file,"src/couch_emsort.erl"},{line,366}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,508}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{couch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}
```
If the problem occurs, inserting data is still possible, but often I get the
following error message (btw, I'm using python-cloudant)
```
500 Server Error: Internal Server Error unknown_error undefined for url:
http://localhost:5984/directory
```
## Steps to Reproduce
[NOTE]: # ( Include commands to reproduce, if possible. curl is preferred. )
1. Clean database
2. Create a script, which creates documents in an endless loop (pur json, no
attachments, just one revision)
3. After around 76K documents the compactor starts to fail.
4. Inserts are still possible, but time and again, the insert fails with
(see above 500 Server Error)
I did the mentioned stress test above on 3 nodes in parallel. All 3 nodes
started to fail around the same amount of documents (70K-80K).
* In the first node, I created the documents single threaded
* In the second node, I created the documents using two threads
* In the third node, I created the documents using four threads
Following the script I used to reproduce the issue in my setup:
```
#!/usr/bin/env python
import signal
import sys
from cloudant.client import CouchDB
from cloudant.document import Document
from copy import deepcopy
from threading import Thread
USERNAME = 'admin'
PASSWORD = 'admin'
COUCHDB_URL = 'http://localhost:5984'
DB_NAME = 'directory'
cdb = CouchDB(USERNAME, PASSWORD, url=COUCHDB_URL, connect=True,
auto_renew=True)
account_skeletton = { 'parameter 1': 0,
'parameter 2': True,
'parameter 3': '',
'parameter 4': '',
'parameter 5': [],
'parameter 6': [],
'description': '',
'enabled': True,
'firstname': '',
'parameter 7': False,
'lastname': '',
'parameter 8': '',
'number': '',
'parameter 9':
'9301162291d5a0480270d97d6c4a6da3edd75aa5',
'parameter 10': 'cos02',
'parameter 11': '112233',
'parameter 12': 1620118266.572422,
'parameter 13': 0,
'parameter 14': 0.0,
'parameter 15': False,
'parameter 16': 4,
'parameter 17': '',
'parameter 18': '',
'parameter 19': 'user',
'userid': '',
'parameter 20': '',
'parameter 21': '',
'parameter 22': True}
if DB_NAME not in cdb.all_dbs():
cdb.create_database(DB_NAME)
def signal_handler(sig, frame):
print('You pressed Ctrl+C!')
sys.exit(0)
def create_documents(start=0, thread_id=0):
try:
for i in xrange(start, 999999):
number = '{}{:06}'.format(thread_id, i)
print('create_documents: Creating document {}'.format(number))
with Document(cdb[DB_NAME], number) as document:
document.update(deepcopy(account_skeletton))
document['firstname'] = 'FN {}'.format(number)
document['lastname'] = 'LN {}'.format(number)
document['number'] = number
document['userid'] = number
except Exception as err:
print('create_documents: {}'.format(err))
def create_documents_threaded(threads=2):
for i in xrange(threads):
t = Thread(target=create_documents, args=(0, i))
t.daemon = True
t.start()
signal.signal(signal.SIGINT, signal_handler)
print('Press Ctrl+C')
signal.pause()
```
## Expected Behaviour
[NOTE]: # ( Tell us what you expected to happen. )
Compaction doesn't fail :-)
## Your Environment
[TIP]: # ( Include as many relevant details about your environment as
possible. )
[TIP]: # ( You can paste the output of curl http://YOUR-COUCHDB:5984/ here.
)
* CouchDB version used:
`{"couchdb":"Welcome","version":"3.1.1","git_sha":"ce596c65d","uuid":"08fb7cd0a10f35f6215a531742f7b356","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
Apache Software Foundation"}}`
* python-cloudant: 2.14.0
* python2.7
* Operating system and version:
* Own Linux distribution
* CouchDB running in a VM
* Single Core (also changed to 2 cores, no difference)
* 1GB Ram (also increased it to 1GB, no difference)
* To trigger this issue, I used an isolated node, no replication, no
clustering
## Additional Context
[TIP]: # ( Add any other context about the problem here. )
Following you can find the configuration. Most of it is default:
```
curl http://admin:admin@localhost:5984/_node/[email protected]/_config |
python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left Speed
100 2823 100 2823 0 0 310k 0 --:--:-- --:--:-- --:--:--
344k
{
"admins": {
"admin":
"-pbkdf2-d5b128e39ebe61b4f50fb9c2e3241c0ea1bc28f9,6b6e6d21c67f685f753d8fa1fe72db71,10"
},
"attachments": {
"compressible_types": "text/*, application/javascript,
application/json, application/xml",
"compression_level": "8"
},
"chttpd": {
"backlog": "512",
"bind_address": "0.0.0.0",
"max_db_number_for_dbs_info_req": "100",
"port": "5984",
"prefer_minimal": "Cache-Control, Content-Length, Content-Range,
Content-Type, ETag, Server, Transfer-Encoding, Vary",
"require_valid_user": "false",
"server_options": "[{recbuf, undefined}]",
"socket_options": "[{sndbuf, 262144}, {nodelay, true}]"
},
"cluster": {
"n": "3",
"q": "2"
},
"cors": {
"credentials": "false"
},
"couch_httpd_auth": {
"allow_persistent_cookies": "true",
"auth_cache_size": "50",
"authentication_db": "_users",
"authentication_redirect": "/_utils/session.html",
"iterations": "10",
"require_valid_user": "false",
"secret": "a0ec90afc5f896e3cf90e8c4adc9dafa",
"timeout": "600"
},
"couch_peruser": {
"database_prefix": "userdb-",
"delete_dbs": "false",
"enable": "false"
},
"couchdb": {
"attachment_stream_buffer_size": "4096",
"changes_doc_ids_optimization_threshold": "100",
"database_dir": "/var/crypt/couchdb/couchdb",
"default_engine": "couch",
"default_security": "everyone",
"file_compression": "snappy",
"max_dbs_open": "500",
"max_document_size": "8000000",
"os_process_timeout": "5000",
"single_node": "true",
"users_db_security_editable": "false",
"uuid": "08fb7cd0a10f35f6215a531742f7b356",
"view_index_dir": "/var/crypt/couchdb/couchdb"
},
"couchdb_engines": {
"couch": "couch_bt_engine"
},
"csp": {
"enable": "true"
},
"feature_flags": {
"partitioned||*": "true"
},
"httpd": {
"allow_jsonp": "false",
"authentication_handlers": "{couch_httpd_auth,
cookie_authentication_handler}, {couch_httpd_auth,
default_authentication_handler}",
"bind_address": "127.0.0.1",
"enable_cors": "false",
"enable_xframe_options": "false",
"max_http_request_size": "4294967296",
"port": "5986",
"secure_rewrites": "true",
"socket_options": "[{sndbuf, 262144}]"
},
"indexers": {
"couch_mrview": "true"
},
"ioq": {
"concurrency": "10",
"ratio": "0.01"
},
"ioq.bypass": {
"compaction": "false",
"os_process": "true",
"read": "true",
"shard_sync": "false",
"view_update": "true",
"write": "true"
},
"log": {
"file": "/var/log/couchdb/couchdb.log",
"level": "info",
"writer": "file"
},
"query_server_config": {
"os_process_limit": "100",
"reduce_limit": "true"
},
"replicator": {
"connection_timeout": "30000",
"http_connections": "20",
"interval": "60000",
"max_churn": "20",
"max_jobs": "500",
"retries_per_request": "5",
"socket_options": "[{keepalive, true}, {nodelay, false}]",
"ssl_certificate_max_depth": "3",
"startup_jitter": "5000",
"verify_ssl_certificates": "true",
"worker_batch_size": "500",
"worker_processes": "4"
},
"smoosh": {
"db_channels": "upgrade_dbs,ratio_dbs",
"view_channels": "upgrade_views,ratio_views"
},
"ssl": {
"port": "6984"
},
"uuids": {
"algorithm": "sequential",
"max_count": "1000"
},
"vendor": {
"name": "The Apache Software Foundation"
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]