[
https://issues.apache.org/jira/browse/COUCHDB-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905458#comment-13905458
]
Darren Gibbard commented on COUCHDB-2070:
-----------------------------------------
Another example of timeout issue, probably related is the following for the
compaction dying; The really concerning thing with this one though is the
complaints that the replicator/replication databases disappeared...! They did
return at least.
{noformat}
[Wed, 19 Feb 2014 09:24:16 GMT] [error] [<0.27372.16>] {error_report,<0.30.0>,
{<0.27372.16>,supervisor_report,
[{supervisor,{local,couch_secondary_services}},
{errorContext,child_terminated},
{reason,
{compaction_loop_died,
{timeout,
{gen_server,call,[couch_server,get_server]}}}},
{offender,
[{pid,<0.7610.20>},
{name,compaction_daemon},
{mfargs,{couch_compaction_daemon,start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]}]}}
[Wed, 19 Feb 2014 09:24:17 GMT] [error] [<0.23575.19>] Replicator, request GET
to
"http://admin:*****@192.168.24.92:5984/pim/_changes?feed=continuous&style=all_docs&since=47931809&heartbeat=10000"
failed due to error {error,req_timedout}
[Wed, 19 Feb 2014 09:24:17 GMT] [error] [<0.5660.20>] Uncaught error in HTTP
request: {exit,
{timeout,
{gen_server,call,
[couch_server,
get_server]}}}
[Wed, 19 Feb 2014 09:24:18 GMT] [error] [<0.5660.20>] httpd 500 error response:
{"error":"timeout","reason":"{gen_server,call,[couch_server,get_server]}"}
[Wed, 19 Feb 2014 09:24:21 GMT] [error] [<0.7793.20>] ** Generic server
couch_compaction_daemon terminating
** Last message in was {'EXIT',<0.7773.20>,
{timeout,
{gen_server,call,[couch_server,get_server]}}}
** When Server state == {state,<0.7773.20>}
** Reason for termination ==
** {compaction_loop_died,
{timeout,{gen_server,call,[couch_server,get_server]}}}
[Wed, 19 Feb 2014 09:24:21 GMT] [error] [<0.7793.20>] {error_report,<0.30.0>,
{<0.7793.20>,crash_report,
[[{initial_call,
{couch_compaction_daemon,init,['Argument__1']}},
{pid,<0.7793.20>},
{registered_name,couch_compaction_daemon},
{error_info,
{exit,
{compaction_loop_died,
{timeout,
{gen_server,call,[couch_server,get_server]}}},
[{gen_server,terminate,6,
[{file,"gen_server.erl"},{line,744}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
{ancestors,
[couch_secondary_services,couch_server_sup,
<0.31.0>]},
{messages,[]},
{links,[<0.27372.16>]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,610},
{stack_size,27},
{reductions,3109}],
[]]}}
[Wed, 19 Feb 2014 09:24:21 GMT] [error] [<0.27372.16>] {error_report,<0.30.0>,
{<0.27372.16>,supervisor_report,
[{supervisor,{local,couch_secondary_services}},
{errorContext,child_terminated},
{reason,
{compaction_loop_died,
{timeout,
{gen_server,call,[couch_server,get_server]}}}},
{offender,
[{pid,<0.7793.20>},
{name,compaction_daemon},
{mfargs,{couch_compaction_daemon,start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]}]}}
[Wed, 19 Feb 2014 09:24:21 GMT] [error] [<0.27372.16>] {error_report,<0.30.0>,
{<0.27372.16>,supervisor_report,
[{supervisor,{local,couch_secondary_services}},
{errorContext,shutdown},
{reason,reached_max_restart_intensity},
{offender,
[{pid,<0.7793.20>},
{name,compaction_daemon},
{mfargs,
{couch_compaction_daemon,start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]}]}}
[Wed, 19 Feb 2014 09:24:21 GMT] [error] [<0.23575.19>] Replicator, request GET
to
"http://admin:*****@192.168.24.92:5984/pim/_changes?feed=continuous&style=all_docs&since=47931809&heartbeat=10000"
failed due to error {error,connection_closed}
[Wed, 19 Feb 2014 09:24:21 GMT] [error] [<0.83.0>] {error_report,<0.30.0>,
{<0.83.0>,supervisor_report,
[{supervisor,{local,couch_server_sup}},
{errorContext,child_terminated},
{reason,shutdown},
{offender,
[{pid,<0.27372.16>},
{name,couch_secondary_services},
{mfargs,{couch_secondary_sup,start_link,[]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]}]}}
[Wed, 19 Feb 2014 09:25:13 GMT] [error] [<0.11534.20>] Could not open file
/opt/couchdb/dbs/replicator.couch: no such file or directory
[Wed, 19 Feb 2014 09:25:19 GMT] [error] [<0.11791.20>] Could not open file
/opt/couchdb/dbs/replication.couch: no such file or directory
{noformat}
> [1.4.0] CouchDB Replication Crashes
> -----------------------------------
>
> Key: COUCHDB-2070
> URL: https://issues.apache.org/jira/browse/COUCHDB-2070
> Project: CouchDB
> Issue Type: Bug
> Security Level: public(Regular issues)
> Components: Replication
> Reporter: Darren Gibbard
>
> Hi all,
> I have an issue at the moment that appears to have followed me from v1.2.1
> with erlang R14, through to an upgrade to v1.4.0 with R16B01.
> I have 20 "remote" nodes, and one "central" node; and each of the remote
> instances are configured with Bi-Direction replication (ie. no replication
> defined on the Central node directly). Single main database of ~600,000
> documents at ~11GB in size.
> On the remote nodes, and more frequently the Central node, I get *huge*
> (3000+ lines) errors in the logs- seemingly intermittently; I'm yet to track
> down the root cause here. Open file handles and ERL_MAX_PORTS are set to
> values upwards of 16k.
> Other stats:
> {noformat}
> $ sudo su - couchdb -c "lsof | grep -c ."
> 1511
> $ sudo netstat -npla | grep "ESTAB" | grep -c .
> 310
> $ ps -ef | grep -c "^couchdb"
> 19
> {noformat}
> An example log from a Remote node is:
> http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01.20140218.log
> An example log from the Central node is:
> http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01_central.20140218.log
> The main error line is "{error,{error,req_timedout}}}}" for either
> "_bulk_docs" on remote nodes, or "_revs_diff" on the central node it would
> seem.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)