cg1972 opened a new issue, #90:
URL: https://github.com/apache/couchdb-helm/issues/90
**Describe the bug**
We have used the helm charts to install a 3 node couchdb cluster. We have
noticed that one of the nodes in the cluster (coordinator node) is restarting
on a regular basis, usually once a day.
The couchdb pod error is
`Container couchdb failed liveness probe, will be restarted`
The couchdb logs indicate the following errors:
`[notice] 2022-06-23T06:23:43.733589Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30690.22> b938e4d3fa 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 29613
[notice] 2022-06-23T06:23:45.153723Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30707.22> 7bc4d55e3e 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 14872
[notice] 2022-06-23T06:23:45.154154Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30697.22> d48973481d 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 13
[error] 2022-06-23T06:23:45.273373Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30706.22> 45a001ddb1 req_err(2751202856) timeout : The request could not be
processed in a reasonable amount of time.
[<<"gen_server:call/2 L238">>,<<"chttpd_misc:handle_up_req/1
L274">>,<<"chttpd:handle_req_after_auth/2 L327">>,<<"chttpd:process_request/1
L310">>,<<"chttpd:handle_request_int/1 L249">>,<<"mochiweb_http:headers/6
L150">>,<<"proc_lib:init_p_do_apply/3 L226">>]
[error] 2022-06-23T06:23:45.274458Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30691.22> c975e456eb req_err(2751202856) timeout : The request could not be
processed in a reasonable amount of time.
[<<"gen_server:call/2 L238">>,<<"chttpd_misc:handle_up_req/1
L274">>,<<"chttpd:handle_req_after_auth/2 L327">>,<<"chttpd:process_request/1
L310">>,<<"chttpd:handle_request_int/1 L249">>,<<"mochiweb_http:headers/6
L150">>,<<"proc_lib:init_p_do_apply/3 L226">>]
[notice] 2022-06-23T06:23:45.274005Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30706.22> 45a001ddb1 192.168.230.108:5984 10.1.2.179 undefined GET /_up 500
ok 15095
[notice] 2022-06-23T06:23:45.274958Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30691.22> c975e456eb 192.168.230.108:5984 10.1.2.179 undefined GET /_up 500
ok 34630
[error] 2022-06-23T06:23:45.915833Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.23685.4>
-------- gen_server couch_prometheus_server terminated with reason:
{timeout,{gen_server,call,[couch_stats_aggregator,fetch]}} at
gen_server:call/2(line:238) <= couch_stats_aggregator:fetch/0(line:44) <=
couch_prometheus_server:get_couchdb_stats/0(line:94) <=
couch_prometheus_server:refresh_metrics/0(line:87) <=
couch_prometheus_server:handle_info/2(line:74) <=
gen_server:try_dispatch/4(line:689) <= gen_server:handle_msg/6(line:765) <=
proc_lib:init_p_do_apply/3(line:226)
last msg: redacted
state: {st,<<"# TYPE couchdb_couch_log_requests_total
counter\ncouchdb_couch_log_requests_total{level=\"alert\"}
0\ncouchdb_couch_log_requests_total{level=\"critical\"}
0\ncouchdb_couch_log_requests_total{level=\"debug\"}
0\ncouchdb_couch_log_requests_total{level=\"emergency\"}
0\ncouchdb_couch_log_requests_total{level=\"error\"}
0\ncouchdb_couch_log_requests_total{level=\"info\"}
7\ncouchdb_couch_log_requests_total{level=\"notice\"}
18573\ncouchdb_couch_log_requests_total{level=\"warning\"} 0\n# TYPE
couchdb_couch_replicator_changes_manager_deaths_total
counter\ncouchdb_couch_replicator_changes_manager_deaths_total 0\n# TYPE
couchdb_couch_replicator_changes_queue_deaths_total
counter\ncouchdb_couch_replicator_changes_queue_deaths_total 0\n# TYPE
couchdb_couch_replicator_changes_read_failures_total
counter\ncouchdb_couch_replicator_changes_read_failures_total 0\n# TYPE
couchdb_couch_replicator_changes_reader_deaths_total
counter\ncouchdb_couch_replicator_changes_reader_deaths
_total 0\n# TYPE couchdb_couch_replicator_checkpoints_failure_total
counter\ncouchdb_couch_replicator_checkpoints_failure_total 0\n# TYPE
couchdb_couch_replicator_checkpoints_total
counter\ncouchdb_couch_replicator_checkpoints_total 0\n# TYPE
couchdb_couch_replicator_cluster_is_stable
gauge\ncouchdb_couch_replicator_cluster_is_stable 1\n# TYPE
couchdb_couch_replicator_connection_acquires_total
counter\ncouchdb_couch_replicator_connection_acquires_total 0\n# TYPE
couchdb_couch_replicator_connection_closes_total
counter\ncouchdb_couch_replicator_connection_closes_total 0\n# TYPE
couchdb_couch_replicator_connection_creates_total
counter\ncouchdb_couch_replicator_connection_creates_total 0\n# TYPE
couchdb_couch_replicator_connection_owner_crashes_total
counter\ncouchdb_couch_replicator_connection_owner_crashes_total 0\n# TYPE
couchdb_couch_replicator_connection_releases_total
counter\ncouchdb_couch_replicator_connection_releases_total 0\n# TYPE
couchdb_couch_replicator_connection_worker
_crashes_total
counter\ncouchdb_couch_replicator_connection_worker_crashes_total 0\n# TYPE
couchdb_couch_replicator_db_scans_total
counter\ncouchdb_couch_replicator_db_scans_total 1\n# TYPE
couchdb_couch_replicator_docs_completed_state_updates_total
counter\ncouchdb_couch_replicator_docs_completed_state_updates_total 0\n# TYPE
couchdb_couch_replicator_docs_db_changes_total
counter\ncouchdb_couch_replicator_docs_db_changes_total 0\n# TYPE
couchdb_couch_replicator_docs_dbs_created_total
counter\ncouchdb_couch_replicator_docs_dbs_created_total 0\n# TYPE
couchdb_couch_replicator_docs_dbs_deleted_total
counter\ncouchdb_couch_replicator_docs_dbs_deleted_total 0\n# TYPE
couchdb_couch_replicator_docs_dbs_found_total
counter\ncouchdb_couch_replicator_docs_dbs_found_total 2\n# TYPE
couchdb_couch_replicator_docs_failed_state_updates_total
counter\ncouchdb_couch_replicator_docs_failed_state_updates_total 0\n# TYPE
couchdb_couch_replicator_failed_starts_total
counter\ncouchdb_couch_replicator_fa
iled_starts_total 0\n# TYPE couchdb_couch_replicator_jobs_adds_total
counter\ncouchdb_couch_replicator_jobs_adds_total 0\n# TYPE
couchdb_couch_replicator_jobs_crashed
gauge\ncouchdb_couch_replicator_jobs_crashed 0\n# TYPE
couchdb_couch_replicator_jobs_crashes_total
counter\ncouchdb_couch_replicator_jobs_crashes_total 0\n# TYPE
couchdb_couch_replicator_jobs_duplicate_adds_total
counter\ncouchdb_couch_replicator_jobs_duplicate_adds_total 0\n# TYPE
couchdb_couch_replicator_jobs_pending
gauge\ncouchdb_couch_replicator_jobs_pending 0\n# TYPE
couchdb_couch_replicator_jobs_removes_total
counter\ncouchdb_couch_replicator_jobs_removes_total 0\n# TYPE
couchdb_couch_replicator_jobs_running
gauge\ncouchdb_couch_replicator_jobs_running 0\n# TYPE
couchdb_couch_replicator_jobs_starts_total
counter\ncouchdb_couch_replicator_jobs_starts_total 0\n# TYPE
couchdb_couch_replicator_jobs_stops_total
counter\ncouchdb_couch_replicator_jobs_stops_total 0\n# TYPE
couchdb_couch_replicator_jobs_total gauge\ncou
chdb_couch_replicator_jobs_total 0\n# TYPE
couchdb_couch_replicator_requests_total
counter\ncouchdb_couch_replicator_requests_total 0\n# TYPE
couchdb_couch_replicator_responses_failure_total
counter\ncouchdb_couch_replicator_responses_failure_total 0\n# TYPE
couchdb_couch_replicator_responses_total
counter\ncouchdb_couch_replicator_responses_total 0\n# TYPE
couchdb_couch_replicator_stream_responses_failure_total
counter\ncouchdb_couch_replicator_stream_responses_failure_total 0\n# TYPE
couchdb_couch_replicator_stream_responses_total
counter\ncouchdb_couch_replicator_stream_responses_total 0\n# TYPE
couchdb_couch_replicator_worker_deaths_total
counter\ncouchdb_couch_replicator_worker_deaths_total 0\n# TYPE
couchdb_couch_replicator_workers_started_total
counter\ncouchdb_couch_replicator_workers_started_total 0\n# TYPE
couchdb_auth_cache_requests_total counter\ncouchdb_auth_cache_requests_total
0\n# TYPE couchdb_auth_cache_misses_total
counter\ncouchdb_auth_cache_misses_total 0\n# TYPE
couchdb_collect_results_time_seconds
summary\ncouchdb_collect_results_time_seconds{quantile=\"0.5\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.75\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.9\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.95\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.99\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.999\"}
0.0\ncouchdb_collect_results_time_seconds_sum
0.0\ncouchdb_collect_results_time_seconds_count 0\n# TYPE
couchdb_couch_server_lru_skip_total
counter\ncouchdb_couch_server_lru_skip_total 0\n# TYPE
couchdb_database_purges_total counter\ncouchdb_database_purges_total 0\n# TYPE
couchdb_database_reads_total counter\ncouchdb_database_reads_total 24\n# TYPE
couchdb_database_writes_total counter\ncouchdb_database_writes_total 0\n# TYPE
couchdb_db_open_time_seconds
summary\ncouchdb_db_open_time_seconds{quantile=\"0.5\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.75\"} 0.0\ncouchdb_db_open_
time_seconds{quantile=\"0.9\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.95\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.99\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.999\"}
0.0\ncouchdb_db_open_time_seconds_sum 0.0\ncouchdb_db_open_time_seconds_count
0\n# TYPE couchdb_dbinfo_seconds
summary\ncouchdb_dbinfo_seconds{quantile=\"0.5\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.75\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.9\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.95\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.99\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.999\"} 0.0\ncouchdb_dbinfo_seconds_sum
0.0\ncouchdb_dbinfo_seconds_count 0\n# TYPE couchdb_document_inserts_total
counter\ncouchdb_document_inserts_total 7\n# TYPE
couchdb_document_purges_failure_total
counter\ncouchdb_document_purges_failure_total 0\n# TYPE
couchdb_document_purges_success_total
counter\ncouchdb_document_purges_success_total 0\n# TYPE
couchdb_document_purges_total_total counter\ncouchdb_document_p
urges_total_total 0\n# TYPE couchdb_document_writes_total
counter\ncouchdb_document_writes_total 14\n# TYPE
couchdb_httpd_aborted_requests_total
counter\ncouchdb_httpd_aborted_requests_total 0\n# TYPE
couchdb_httpd_all_docs_timeouts_total
counter\ncouchdb_httpd_all_docs_timeouts_total 0\n# TYPE
couchdb_httpd_bulk_docs_seconds
summary\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.5\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.75\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.9\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.95\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.99\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.999\"}
0.0\ncouchdb_httpd_bulk_docs_seconds_sum
0.0\ncouchdb_httpd_bulk_docs_seconds_count 0\n# TYPE
couchdb_httpd_bulk_requests_total counter\ncouchdb_httpd_bulk_requests_total
0\n# TYPE couchdb_httpd_clients_requesting_changes_total
counter\ncouchdb_httpd_clients_requesting_changes_total 0\n...">>,...}
extra: []
[error] 2022-06-23T06:23:45.938589Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.23685.4>
-------- gen_server couch_prometheus_server terminated with reason:
{timeout,{gen_server,call,[couch_stats_aggregator,fetch]}} at
gen_server:call/2(line:238) <= couch_stats_aggregator:fetch/0(line:44) <=
couch_prometheus_server:get_couchdb_stats/0(line:94) <=
couch_prometheus_server:refresh_metrics/0(line:87) <=
couch_prometheus_server:handle_info/2(line:74) <=
gen_server:try_dispatch/4(line:689) <= gen_server:handle_msg/6(line:765) <=
proc_lib:init_p_do_apply/3(line:226)
last msg: redacted
state: {st,<<"# TYPE couchdb_couch_log_requests_total
counter\ncouchdb_couch_log_requests_total{level=\"alert\"}
0\ncouchdb_couch_log_requests_total{level=\"critical\"}
0\ncouchdb_couch_log_requests_total{level=\"debug\"}
0\ncouchdb_couch_log_requests_total{level=\"emergency\"}
0\ncouchdb_couch_log_requests_total{level=\"error\"}
0\ncouchdb_couch_log_requests_total{level=\"info\"}
7\ncouchdb_couch_log_requests_total{level=\"notice\"}
18573\ncouchdb_couch_log_requests_total{level=\"warning\"} 0\n# TYPE
couchdb_couch_replicator_changes_manager_deaths_total
counter\ncouchdb_couch_replicator_changes_manager_deaths_total 0\n# TYPE
couchdb_couch_replicator_changes_queue_deaths_total
counter\ncouchdb_couch_replicator_changes_queue_deaths_total 0\n# TYPE
couchdb_couch_replicator_changes_read_failures_total
counter\ncouchdb_couch_replicator_changes_read_failures_total 0\n# TYPE
couchdb_couch_replicator_changes_reader_deaths_total
counter\ncouchdb_couch_replicator_changes_reader_deaths
_total 0\n# TYPE couchdb_couch_replicator_checkpoints_failure_total
counter\ncouchdb_couch_replicator_checkpoints_failure_total 0\n# TYPE
couchdb_couch_replicator_checkpoints_total
counter\ncouchdb_couch_replicator_checkpoints_total 0\n# TYPE
couchdb_couch_replicator_cluster_is_stable
gauge\ncouchdb_couch_replicator_cluster_is_stable 1\n# TYPE
couchdb_couch_replicator_connection_acquires_total
counter\ncouchdb_couch_replicator_connection_acquires_total 0\n# TYPE
couchdb_couch_replicator_connection_closes_total
counter\ncouchdb_couch_replicator_connection_closes_total 0\n# TYPE
couchdb_couch_replicator_connection_creates_total
counter\ncouchdb_couch_replicator_connection_creates_total 0\n# TYPE
couchdb_couch_replicator_connection_owner_crashes_total
counter\ncouchdb_couch_replicator_connection_owner_crashes_total 0\n# TYPE
couchdb_couch_replicator_connection_releases_total
counter\ncouchdb_couch_replicator_connection_releases_total 0\n# TYPE
couchdb_couch_replicator_connection_worker
_crashes_total
counter\ncouchdb_couch_replicator_connection_worker_crashes_total 0\n# TYPE
couchdb_couch_replicator_db_scans_total
counter\ncouchdb_couch_replicator_db_scans_total 1\n# TYPE
couchdb_couch_replicator_docs_completed_state_updates_total
counter\ncouchdb_couch_replicator_docs_completed_state_updates_total 0\n# TYPE
couchdb_couch_replicator_docs_db_changes_total
counter\ncouchdb_couch_replicator_docs_db_changes_total 0\n# TYPE
couchdb_couch_replicator_docs_dbs_created_total
counter\ncouchdb_couch_replicator_docs_dbs_created_total 0\n# TYPE
couchdb_couch_replicator_docs_dbs_deleted_total
counter\ncouchdb_couch_replicator_docs_dbs_deleted_total 0\n# TYPE
couchdb_couch_replicator_docs_dbs_found_total
counter\ncouchdb_couch_replicator_docs_dbs_found_total 2\n# TYPE
couchdb_couch_replicator_docs_failed_state_updates_total
counter\ncouchdb_couch_replicator_docs_failed_state_updates_total 0\n# TYPE
couchdb_couch_replicator_failed_starts_total
counter\ncouchdb_couch_replicator_fa
iled_starts_total 0\n# TYPE couchdb_couch_replicator_jobs_adds_total
counter\ncouchdb_couch_replicator_jobs_adds_total 0\n# TYPE
couchdb_couch_replicator_jobs_crashed
gauge\ncouchdb_couch_replicator_jobs_crashed 0\n# TYPE
couchdb_couch_replicator_jobs_crashes_total
counter\ncouchdb_couch_replicator_jobs_crashes_total 0\n# TYPE
couchdb_couch_replicator_jobs_duplicate_adds_total
counter\ncouchdb_couch_replicator_jobs_duplicate_adds_total 0\n# TYPE
couchdb_couch_replicator_jobs_pending
gauge\ncouchdb_couch_replicator_jobs_pending 0\n# TYPE
couchdb_couch_replicator_jobs_removes_total
counter\ncouchdb_couch_replicator_jobs_removes_total 0\n# TYPE
couchdb_couch_replicator_jobs_running
gauge\ncouchdb_couch_replicator_jobs_running 0\n# TYPE
couchdb_couch_replicator_jobs_starts_total
counter\ncouchdb_couch_replicator_jobs_starts_total 0\n# TYPE
couchdb_couch_replicator_jobs_stops_total
counter\ncouchdb_couch_replicator_jobs_stops_total 0\n# TYPE
couchdb_couch_replicator_jobs_total gauge\ncou
chdb_couch_replicator_jobs_total 0\n# TYPE
couchdb_couch_replicator_requests_total
counter\ncouchdb_couch_replicator_requests_total 0\n# TYPE
couchdb_couch_replicator_responses_failure_total
counter\ncouchdb_couch_replicator_responses_failure_total 0\n# TYPE
couchdb_couch_replicator_responses_total
counter\ncouchdb_couch_replicator_responses_total 0\n# TYPE
couchdb_couch_replicator_stream_responses_failure_total
counter\ncouchdb_couch_replicator_stream_responses_failure_total 0\n# TYPE
couchdb_couch_replicator_stream_responses_total
counter\ncouchdb_couch_replicator_stream_responses_total 0\n# TYPE
couchdb_couch_replicator_worker_deaths_total
counter\ncouchdb_couch_replicator_worker_deaths_total 0\n# TYPE
couchdb_couch_replicator_workers_started_total
counter\ncouchdb_couch_replicator_workers_started_total 0\n# TYPE
couchdb_auth_cache_requests_total counter\ncouchdb_auth_cache_requests_total
0\n# TYPE couchdb_auth_cache_misses_total
counter\ncouchdb_auth_cache_misses_total 0\n# TYPE
couchdb_collect_results_time_seconds
summary\ncouchdb_collect_results_time_seconds{quantile=\"0.5\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.75\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.9\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.95\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.99\"}
0.0\ncouchdb_collect_results_time_seconds{quantile=\"0.999\"}
0.0\ncouchdb_collect_results_time_seconds_sum
0.0\ncouchdb_collect_results_time_seconds_count 0\n# TYPE
couchdb_couch_server_lru_skip_total
counter\ncouchdb_couch_server_lru_skip_total 0\n# TYPE
couchdb_database_purges_total counter\ncouchdb_database_purges_total 0\n# TYPE
couchdb_database_reads_total counter\ncouchdb_database_reads_total 24\n# TYPE
couchdb_database_writes_total counter\ncouchdb_database_writes_total 0\n# TYPE
couchdb_db_open_time_seconds
summary\ncouchdb_db_open_time_seconds{quantile=\"0.5\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.75\"} 0.0\ncouchdb_db_open_
time_seconds{quantile=\"0.9\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.95\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.99\"}
0.0\ncouchdb_db_open_time_seconds{quantile=\"0.999\"}
0.0\ncouchdb_db_open_time_seconds_sum 0.0\ncouchdb_db_open_time_seconds_count
0\n# TYPE couchdb_dbinfo_seconds
summary\ncouchdb_dbinfo_seconds{quantile=\"0.5\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.75\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.9\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.95\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.99\"}
0.0\ncouchdb_dbinfo_seconds{quantile=\"0.999\"} 0.0\ncouchdb_dbinfo_seconds_sum
0.0\ncouchdb_dbinfo_seconds_count 0\n# TYPE couchdb_document_inserts_total
counter\ncouchdb_document_inserts_total 7\n# TYPE
couchdb_document_purges_failure_total
counter\ncouchdb_document_purges_failure_total 0\n# TYPE
couchdb_document_purges_success_total
counter\ncouchdb_document_purges_success_total 0\n# TYPE
couchdb_document_purges_total_total counter\ncouchdb_document_p
urges_total_total 0\n# TYPE couchdb_document_writes_total
counter\ncouchdb_document_writes_total 14\n# TYPE
couchdb_httpd_aborted_requests_total
counter\ncouchdb_httpd_aborted_requests_total 0\n# TYPE
couchdb_httpd_all_docs_timeouts_total
counter\ncouchdb_httpd_all_docs_timeouts_total 0\n# TYPE
couchdb_httpd_bulk_docs_seconds
summary\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.5\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.75\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.9\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.95\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.99\"}
0.0\ncouchdb_httpd_bulk_docs_seconds{quantile=\"0.999\"}
0.0\ncouchdb_httpd_bulk_docs_seconds_sum
0.0\ncouchdb_httpd_bulk_docs_seconds_count 0\n# TYPE
couchdb_httpd_bulk_requests_total counter\ncouchdb_httpd_bulk_requests_total
0\n# TYPE couchdb_httpd_clients_requesting_changes_total
counter\ncouchdb_httpd_clients_requesting_changes_total 0\n...">>,...}
extra: []
[error] 2022-06-23T06:23:45.953809Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.23685.4>
-------- CRASH REPORT Process couch_prometheus_server (<0.23685.4>) with 0
neighbors exited with reason:
{timeout,{gen_server,call,[couch_stats_aggregator,fetch]}} at
gen_server:call/2(line:238) <= couch_stats_aggregator:fetch/0(line:44) <=
couch_prometheus_server:get_couchdb_stats/0(line:94) <=
couch_prometheus_server:refresh_metrics/0(line:87) <=
couch_prometheus_server:handle_info/2(line:74) <=
gen_server:try_dispatch/4(line:689) <= gen_server:handle_msg/6(line:765) <=
proc_lib:init_p_do_apply/3(line:226); initial_call:
{couch_prometheus_server,init,['Argument__1']}, ancestors:
[couch_prometheus_sup,<0.251.0>], message_queue_len: 1, links: [<0.252.0>],
dictionary: [], trap_exit: false, status: running, heap_size: 46422,
stack_size: 28, reductions: 5547311068
[error] 2022-06-23T06:23:45.954202Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.23685.4>
-------- CRASH REPORT Process couch_prometheus_server (<0.23685.4>) with 0
neighbors exited with reason:
{timeout,{gen_server,call,[couch_stats_aggregator,fetch]}} at
gen_server:call/2(line:238) <= couch_stats_aggregator:fetch/0(line:44) <=
couch_prometheus_server:get_couchdb_stats/0(line:94) <=
couch_prometheus_server:refresh_metrics/0(line:87) <=
couch_prometheus_server:handle_info/2(line:74) <=
gen_server:try_dispatch/4(line:689) <= gen_server:handle_msg/6(line:765) <=
proc_lib:init_p_do_apply/3(line:226); initial_call:
{couch_prometheus_server,init,['Argument__1']}, ancestors:
[couch_prometheus_sup,<0.251.0>], message_queue_len: 1, links: [<0.252.0>],
dictionary: [], trap_exit: false, status: running, heap_size: 46422,
stack_size: 28, reductions: 5547311068
[error] 2022-06-23T06:23:46.044832Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.252.0>
-------- Supervisor couch_prometheus_sup had child couch_prometheus_server
started with couch_prometheus_server:start_link() at <0.23685.4> exit with
reason {timeout,{gen_server,call,[couch_stats_aggregator,fetch]}} in context
child_terminated
[error] 2022-06-23T06:23:46.044957Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.252.0>
-------- Supervisor couch_prometheus_sup had child couch_prometheus_server
started with couch_prometheus_server:start_link() at <0.23685.4> exit with
reason {timeout,{gen_server,call,[couch_stats_aggregator,fetch]}} in context
child_terminated
[notice] 2022-06-23T06:23:46.364711Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30722.22> 4688407aa4 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 49
[notice] 2022-06-23T06:24:06.671559Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30883.22> 54b4dc40cb 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 150
[notice] 2022-06-23T06:24:09.260972Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30884.22> e34d093bcd 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 1660
[info] 2022-06-23T06:24:09.602627Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.40.0>
-------- SIGTERM received - shutting down
[info] 2022-06-23T06:24:09.602724Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.40.0>
-------- SIGTERM received - shutting down
[notice] 2022-06-23T06:24:14.600448Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local
<0.30916.22> c6896d1f00 192.168.230.108:5984 10.1.2.179 undefined GET /_up 200
ok 56
[error] 2022-06-23T06:24:18.961753Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.811.0>
-------- gen_server <0.811.0> terminated with reason: killed
last msg: redacted
state:
{state,#Ref<0.3717146181.405405699.170850>,couch_replicator_doc_processor,nil,<<"_replicator">>,#Ref<0.3717146181.405274627.170851>,nil,[],true}
extra: []
[error] 2022-06-23T06:24:18.962005Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.811.0>
-------- gen_server <0.811.0> terminated with reason: killed
last msg: redacted
state:
{state,#Ref<0.3717146181.405405699.170850>,couch_replicator_doc_processor,nil,<<"_replicator">>,#Ref<0.3717146181.405274627.170851>,nil,[],true}
extra: []
[error] 2022-06-23T06:24:18.962362Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.811.0>
-------- CRASH REPORT Process (<0.811.0>) with 0 neighbors exited with reason:
killed at gen_server:decode_msg/9(line:475) <=
proc_lib:init_p_do_apply/3(line:226); initial_call:
{couch_multidb_changes,init,['Argument__1']}, ancestors:
[<0.692.0>,couch_replicator_sup,<0.668.0>], message_queue_len: 0, links: [],
dictionary: [], trap_exit: true, status: running, heap_size: 1598, stack_size:
28, reductions: 181192
[error] 2022-06-23T06:24:19.005925Z
couc...@couchdb-couchdb-0.couchdb-couchdb.couchdb.svc.cluster.local <0.811.0>
-------- CRASH REPORT Process (<0.811.0>) with 0 neighbors exited with reason:
killed at gen_server:decode_msg/9(line:475) <=
proc_lib:init_p_do_apply/3(line:226); initial_call:
{couch_multidb_changes,init,['Argument__1']}, ancestors:
[<0.692.0>,couch_replicator_sup,<0.668.0>], message_queue_len: 0, links: [],
dictionary: [], trap_exit: true, status: running, heap_size: 1598, stack_size:
28, reductions: 181192`
**Version of Helm and Kubernetes**:
Helm Version: 3.4.0
Kubernetes Version: 1.18.3
**What happened**:
The coordinator pod would routinely restart due to the error shown above in
the logs
**What you expected to happen**:
All pods should remain running without restarting
**How to reproduce it** (as minimally and precisely as possible):
The issue occurs randomly and restarts with the error shown in the logs.
**Anything else we need to know**:
We have the helm chart deployed in both a testing a production kubernetes
environment and both environments demonstrate the same behaviour. The db only
has a small amount of data in it and the pods do not have any cpu or memory
restrictions. The pods are configured with 16Gb local-path PV's. Average memory
usage is 56Mb and cpu usage below 0.03
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]