[
https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852524#action_12852524
]
Fredrik Widlund commented on COUCHDB-722:
-----------------------------------------
A grep collection of crashes, if it's helpful.
[r...@db3 scripts]# grep -B2 -A2 -E "\[error\] .*terminating" couchdb.stdout
[info] [<0.6318.0>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=69308 201
[info] [<0.291.0>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replication\
checkpoint
[error] [<0.291.0>] ** Generic server <0.291.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.6763.0>,<0.6767.0>,<0.6770.0>,<0.6772.0>,
--
[info] [<0.31361.1>] 127.0.0.1 - - 'POST'
/service-metrics/_ensure_full_commit?seq=98608 201
[info] [<0.273.0>] rebooting http://127.0.0.1:5984/service-metrics/ ->
http://1.2.3.5:5984/service-metrics/ from last known repli\
cation checkpoint
[error] [<0.273.0>] ** Generic server <0.273.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.31620.1>,<0.31625.1>,<0.31627.1>,
--
[info] [<0.24154.5>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=230868 201
[info] [<0.15120.5>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicati\
on checkpoint
[error] [<0.15120.5>] ** Generic server <0.15120.5> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.24125.5>,<0.24129.5>,<0.24132.5>,
--
[info] [<0.4380.7>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=248027 201
[info] [<0.3606.7>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicatio\
n checkpoint
[error] [<0.3606.7>] ** Generic server <0.3606.7> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.4317.7>,<0.4322.7>,<0.4324.7>,<0.4326.7>,
--
[info] [<0.15414.7>] 127.0.0.1 - - 'POST'
/service-metrics/_ensure_full_commit?seq=231731 201
[info] [<0.15142.5>] rebooting http://127.0.0.1:5984/service-metrics/ ->
http://1.2.3.5:5984/service-metrics/ from last known rep\
lication checkpoint
[error] [<0.15142.5>] ** Generic server <0.15142.5> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15516.7>,<0.15521.7>,<0.15523.7>,
--
[info] [<0.26905.7>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=255490 201
[info] [<0.16250.7>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicati\
on checkpoint
[error] [<0.16250.7>] ** Generic server <0.16250.7> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.27125.7>,<0.27129.7>,<0.27132.7>,
--
[info] [<0.8487.8>] 127.0.0.1 - - 'POST'
/service-metrics/_ensure_full_commit?seq=240461 201
[info] [<0.16228.7>] rebooting http://127.0.0.1:5984/service-metrics/ ->
http://1.2.3.5:5984/service-metrics/ from last known rep\
lication checkpoint
[error] [<0.16228.7>] ** Generic server <0.16228.7> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.8531.8>,<0.8536.8>,<0.8538.8>,<0.8540.8>,
--
[info] [<0.15483.8>] 127.0.0.1 - - 'POST'
/service-metrics/_ensure_full_commit?seq=247246 201
[info] [<0.15504.8>] rebooting http://127.0.0.1:5984/service-metrics/ ->
http://1.2.3.5:5984/service-metrics/ from last known rep\
lication checkpoint
[error] [<0.15504.8>] ** Generic server <0.15504.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15557.8>,<0.15563.8>,<0.15567.8>,
--
[info] [<0.15481.8>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicati\
On checkpoint
[info] [<0.16982.8>] 1.2.3.5 - - 'POST' /node-metrics/_ensure_full_commit 201
[error] [<0.15481.8>] ** Generic server <0.15481.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.16926.8>,<0.16930.8>,<0.16933.8>,
--
[info] [<0.20255.8>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=269770 201
[info] [<0.18127.8>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicati\
on checkpoint
[error] [<0.18127.8>] ** Generic server <0.18127.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.20451.8>,<0.20455.8>,<0.20458.8>,
--
[info] [<0.30782.8>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=272628 201
[info] [<0.22327.8>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicati\
on checkpoint
[error] [<0.22327.8>] ** Generic server <0.22327.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.30991.8>,<0.30995.8>,<0.30998.8>,
--
[info] [<0.20666.1>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=288432 201
[info] [<0.274.0>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replication\
checkpoint
[error] [<0.274.0>] ** Generic server <0.274.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.20538.1>,<0.20542.1>,<0.20545.1>,
--
[info] [<0.28001.2>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=295892 201
[info] [<0.21122.1>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known replicati\
on checkpoint
[error] [<0.21122.1>] ** Generic server <0.21122.1> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.27919.2>,<0.27923.2>,<0.27926.2>,
--
[info] [<0.16437.3>] 1.2.3.4 - - 'GET'
/service-metrics/_design/views/_view/allmetrics 200
[error] [<0.256.0>] couch_rep_httpc request failed after 10 retries:
http://1.2.3.5:5984/service-metrics/
[error] [<0.256.0>] ** Generic server <0.256.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15441.3>,<0.15446.3>,<0.15448.3>,
--
reductions: 4021
neighbours:
[error] [<0.15448.3>] ** Generic server <0.15448.3> terminating
** Last message in was {'EXIT',<0.15449.3>,
{{http_request_failed,
--
{child_type,worker}]
[error] [<0.15441.3>] ** Generic server <0.15441.3> terminating
** Last message in was {'EXIT',<0.256.0>,
{http_request_failed,
--
neighbours:
[error] [<0.9022.3>] couch_rep_httpc request failed after 10 retries:
http://1.2.3.5:5984/node-metrics/
[error] [<0.9022.3>] ** Generic server <0.9022.3> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15170.3>,<0.15174.3>,<0.15177.3>,
--
{child_type,worker}]
[error] [<0.15174.3>] ** Generic server <0.15174.3> terminating
** Last message in was {'EXIT',<0.9022.3>,
{http_request_failed,
--
reductions: 7005
neighbours:
[error] [<0.15177.3>] ** Generic server <0.15177.3> terminating
** Last message in was {'EXIT',<0.9022.3>,
{http_request_failed,
--
{stack_size,15},
{reductions,5200}]
[error] [<0.15170.3>] ** Generic server <0.15170.3> terminating
** Last message in was {'EXIT',<0.9022.3>,
{http_request_failed,
--
[info] [<0.31343.3>] 127.0.0.1 - - 'POST'
/service-metrics/_ensure_full_commit?seq=292618 201
[info] [<0.18230.3>] rebooting http://127.0.0.1:5984/service-metrics/ ->
http://1.2.3.5:5984/service-metrics/ from last known rep\
lication checkpoint
[error] [<0.18230.3>] ** Generic server <0.18230.3> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.31889.3>,<0.31894.3>,<0.31896.3>,
--
Fredrik Widlund, CSO / Chief Architect, Qbrick
Direct: +46 8 459 90 32 | Mobile: +46 76 899 96 66
Södra Hamnvägen 22 | 115 41 STOCKHOLM
Web and mobile: www.qbrick.com
-----Ursprungligt meddelande-----
Från: Randall Leeds (JIRA) [mailto:[email protected]]
Skickat: den 1 april 2010 21:12
Till: Fredrik Widlund
Ämne: [jira] Commented: (COUCHDB-722) Continuous replication tasks fail
[
https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852510#action_12852510
]
Randall Leeds commented on COUCHDB-722:
---------------------------------------
I'm rather confused.
The compaction seems to be on the service-metrics database, but the replication
is between databases named node-metrics.
However, there's a POST to /service-metrics/_missing_revs on the target
database right around the time compaction completes. Replication performs this
operation. Are you using vhosts or some kind of proxy layer that's rewriting
any of your requests? Could you include a little bit more context at the end
where you put the ...? In particular I want to know if the replication was
using the service-metrics database at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
> Continuous replication tasks fail
> ---------------------------------
>
> Key: COUCHDB-722
> URL: https://issues.apache.org/jira/browse/COUCHDB-722
> Project: CouchDB
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.11
> Environment: Arch Linux, CouchDB 0.11
> Reporter: Fredrik Widlund
>
> Couchdb 0.11.0 replication tasks fails with the below after working for
> everything from a few minutes to an hour. The below replication is of the
> type {"source":"http://127.0.0.1:5984/node-metrics",
> "target":"http://1.2.3.4:5984/node-metrics", "continuous":true} and the
> node-metrics database exist on both machines.
> The database is periodically compacted which, and I'm speculating here, could
> be a contributing factor to the crash.
> Kind regards,
> Fredrik Widlund
> =CRASH REPORT==== 1-Apr-2010::14:25:26 ===
> crasher:
> initial call: couch_rep:init/1
> pid: <0.274.0>
> registered_name: []
> exception exit: {{badmatch,
> {stop,
> {db_not_found,
> <<"http://127.0.0.1:5984/node-metrics/">>}}},
> [{couch_rep,do_checkpoint,1},
> {couch_rep,handle_cast,2},
> {gen_server,handle_msg,5},
> {proc_lib,init_p_do_apply,3}]}
> in function gen_server:terminate/6
> ancestors: [couch_rep_sup,couch_primary_services,couch_server_sup,
> <0.32.0>]
> messages: [{'EXIT',<0.21084.1>,normal}]
> links: [<0.81.0>]
> dictionary: [{task_status_update,{{1270,124726,124009},0}}]
> trap_exit: true
> status: running
> heap_size: 10946
> stack_size: 24
> reductions: 29173458
> neighbours:
> [error] [<0.81.0>] {error_report,<0.31.0>,
> {<0.81.0>,supervisor_report,
> [{supervisor,{local,couch_rep_sup}},
> {errorContext,child_terminated},
> {reason,
> {{badmatch,
> {stop,
> {db_not_found,<<"http://127.0.0.1:5984/node-metrics/">>}}},
> [{couch_rep,do_checkpoint,1},
> {couch_rep,handle_cast,2},
> {gen_server,handle_msg,5},
> {proc_lib,init_p_do_apply,3}]}},
> {offender,
> [{pid,<0.274.0>},
> {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
> {mfa,
> {gen_server,start_link,
> [couch_rep,
> ["f3e3081db5a215dbaf9b2984f0552090",
> {[{<<"target">>,
> <<"http://1.2.3.4:5984/node-metrics">>},
>
> {<<"source">>,<<"http://127.0.0.1:5984/node-metrics">>},
> {<<"continuous">>,true}]},
> {user_ctx,null,
> [<<"_admin">>],
> <<"{couch_httpd_auth,
> default_authentication_handler}">>}],
> []]}},
> {restart_type,temporary},
> {shutdown,1},
> {child_type,worker}]}]}}
> =SUPERVISOR REPORT==== 1-Apr-2010::14:25:26 ===
> Supervisor: {local,couch_rep_sup}
> Context: child_terminated
> Reason: {{badmatch,
> {stop,
> {db_not_found,
> <<"http://127.0.0.1:5984/node-metrics/">>}}},
> [{couch_rep,do_checkpoint,1},
> {couch_rep,handle_cast,2},
> {gen_server,handle_msg,5},
> {proc_lib,init_p_do_apply,3}]}
> Offender: [{pid,<0.274.0>},
> {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
> {mfa,
> {gen_server,start_link,
> [couch_rep,
> ["f3e3081db5a215dbaf9b2984f0552090",
> {[{<<"target">>,
> <<"http://1.2.3.4:5984/node-metrics">>},
> {<<"source">>,
> <<"http://127.0.0.1:5984/node-metrics">>},
> {<<"continuous">>,true}]},
> {user_ctx,null,
> [<<"_admin">>],
> <<"{couch_httpd_auth,
> default_authentication_handler}">>}],
> []]}},
> {restart_type,temporary},
> {shutdown,1},
> {child_type,worker}]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.