[
https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852523#action_12852523
]
Fredrik Widlund commented on COUCHDB-722:
-----------------------------------------
The service-metrics database is also replicated, to the same target. The
couchdb instances are communicating directly to each other without any proxy,
rewriting or address translating.
I'm afraid the entries from the last mail probably was a crash on the opposite
instance. The below should be from the same crash as the first one. This crash
actually didn’t have the completion of the service-metrics compact directly
before it.
[info] [<0.20666.1>] 127.0.0.1 - - 'POST'
/node-metrics/_ensure_full_commit?seq=288432 201
[info] [<0.274.0>] rebooting http://127.0.0.1:5984/node-metrics/ ->
http://1.2.3.5:5984/node-metrics/ from last known repl\
ication checkpoint
[error] [<0.274.0>] ** Generic server <0.274.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.20538.1>,<0.20542.1>,<0.20545.1>,
<0.20547.1>,
{http_db,"http://127.0.0.1:5984/node-metrics/",
[],[],
[{"User-Agent","CouchDB/0.11.0"},
{"Accept","application/json"},
{"Accept-Encoding","gzip"}],
[],get,nil,
[{response_format,binary},
{inactivity_timeout,30000}],
10,500,nil},
{http_db,
"http://1.2.3.5:5984/node-metrics/",[],
[],
[{"User-Agent","CouchDB/0.11.0"},
{"Accept","application/json"},
{"Accept-Encoding","gzip"}],
[],get,nil,
[{response_format,binary},
{inactivity_timeout,30000}],
10,500,nil},
true,false,
["f3e3081db5a215dbaf9b2984f0552090",
{[{<<"target">>,
<<"http://1.2.3.5:5984/node-metrics">>},
{<<"source">>,
<<"http://127.0.0.1:5984/node-metrics">>},
{<<"continuous">>,true}]},
{user_ctx,null,
[<<"_admin">>],
<<"{couch_httpd_auth,
default_authentication_handler}">>}],
{1270124726131655,#Ref<0.0.11.78165>},
288246,
[...many, many session id entries]
[],false,288432,1163577,nil}
** Reason for termination ==
** {{badmatch,{stop,{db_not_found,<<"http://127.0.0.1:5984/node-metrics/">>}}},
[{couch_rep,do_checkpoint,1},
{couch_rep,handle_cast,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
=ERROR REPORT==== 1-Apr-2010::14:25:26 ===
** Generic server <0.274.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.20538.1>,<0.20542.1>,<0.20545.1>,
<0.20547.1>,
{http_db,"http://127.0.0.1:5984/node-metrics/",
[],[],
[{"User-Agent","CouchDB/0.11.0"},
{"Accept","application/json"},
{"Accept-Encoding","gzip"}],
[],get,nil,
[{response_format,binary},
{inactivity_timeout,30000}],
10,500,nil},
{http_db,
"http://1.2.3.5:5984/node-metrics/",[],
[],
[{"User-Agent","CouchDB/0.11.0"},
{"Accept","application/json"},
{"Accept-Encoding","gzip"}],
[],get,nil,
[{response_format,binary},
{inactivity_timeout,30000}],
10,500,nil},
true,false,
["f3e3081db5a215dbaf9b2984f0552090",
{[{<<"target">>,
<<"http://1.2.3.5:5984/node-metrics">>},
{<<"source">>,
<<"http://127.0.0.1:5984/node-metrics">>},
{<<"continuous">>,true}]},
{user_ctx,null,
[<<"_admin">>],
<<"{couch_httpd_auth,
default_authentication_handler}">>}],
{1270124726131655,#Ref<0.0.11.78165>},
288246,
[...many, many session id entries]
[],false,288432,1163577,nil}
** Reason for termination ==
** {{badmatch,{stop,{db_not_found,<<"http://127.0.0.1:5984/node-metrics/">>}}},
[{couch_rep,do_checkpoint,1},
{couch_rep,handle_cast,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
[error] [<0.274.0>] {error_report,<0.31.0>,
{<0.274.0>,crash_report,
[[{initial_call,{couch_rep,init,['Argument__1']}},
{pid,<0.274.0>},
{registered_name,[]},
{error_info,
{exit,
{{badmatch,
{stop,
{db_not_found,
<<"http://127.0.0.1:5984/node-metrics/">>}}},
[{couch_rep,do_checkpoint,1},
{couch_rep,handle_cast,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
[{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
{ancestors,
[couch_rep_sup,couch_primary_services,couch_server_sup,<0.32.0>]},
{messages,[{'EXIT',<0.21084.1>,normal}]},
{links,[<0.81.0>]},
{dictionary,[{task_status_update,{{1270,124726,124009},0}}]},
{trap_exit,true},
{status,running},
{heap_size,10946},
{stack_size,24},
{reductions,29173458}],
[]]}}
=CRASH REPORT==== 1-Apr-2010::14:25:26 ===
[...follows below...]
Fredrik Widlund, CSO / Chief Architect, Qbrick
Direct: +46 8 459 90 32 | Mobile: +46 76 899 96 66
Södra Hamnvägen 22 | 115 41 STOCKHOLM
Web and mobile: www.qbrick.com
-----Ursprungligt meddelande-----
Från: Randall Leeds (JIRA) [mailto:[email protected]]
Skickat: den 1 april 2010 21:12
Till: Fredrik Widlund
Ämne: [jira] Commented: (COUCHDB-722) Continuous replication tasks fail
[
https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852510#action_12852510
]
Randall Leeds commented on COUCHDB-722:
---------------------------------------
I'm rather confused.
The compaction seems to be on the service-metrics database, but the replication
is between databases named node-metrics.
However, there's a POST to /service-metrics/_missing_revs on the target
database right around the time compaction completes. Replication performs this
operation. Are you using vhosts or some kind of proxy layer that's rewriting
any of your requests? Could you include a little bit more context at the end
where you put the ...? In particular I want to know if the replication was
using the service-metrics database at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
> Continuous replication tasks fail
> ---------------------------------
>
> Key: COUCHDB-722
> URL: https://issues.apache.org/jira/browse/COUCHDB-722
> Project: CouchDB
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.11
> Environment: Arch Linux, CouchDB 0.11
> Reporter: Fredrik Widlund
>
> Couchdb 0.11.0 replication tasks fails with the below after working for
> everything from a few minutes to an hour. The below replication is of the
> type {"source":"http://127.0.0.1:5984/node-metrics",
> "target":"http://1.2.3.4:5984/node-metrics", "continuous":true} and the
> node-metrics database exist on both machines.
> The database is periodically compacted which, and I'm speculating here, could
> be a contributing factor to the crash.
> Kind regards,
> Fredrik Widlund
> =CRASH REPORT==== 1-Apr-2010::14:25:26 ===
> crasher:
> initial call: couch_rep:init/1
> pid: <0.274.0>
> registered_name: []
> exception exit: {{badmatch,
> {stop,
> {db_not_found,
> <<"http://127.0.0.1:5984/node-metrics/">>}}},
> [{couch_rep,do_checkpoint,1},
> {couch_rep,handle_cast,2},
> {gen_server,handle_msg,5},
> {proc_lib,init_p_do_apply,3}]}
> in function gen_server:terminate/6
> ancestors: [couch_rep_sup,couch_primary_services,couch_server_sup,
> <0.32.0>]
> messages: [{'EXIT',<0.21084.1>,normal}]
> links: [<0.81.0>]
> dictionary: [{task_status_update,{{1270,124726,124009},0}}]
> trap_exit: true
> status: running
> heap_size: 10946
> stack_size: 24
> reductions: 29173458
> neighbours:
> [error] [<0.81.0>] {error_report,<0.31.0>,
> {<0.81.0>,supervisor_report,
> [{supervisor,{local,couch_rep_sup}},
> {errorContext,child_terminated},
> {reason,
> {{badmatch,
> {stop,
> {db_not_found,<<"http://127.0.0.1:5984/node-metrics/">>}}},
> [{couch_rep,do_checkpoint,1},
> {couch_rep,handle_cast,2},
> {gen_server,handle_msg,5},
> {proc_lib,init_p_do_apply,3}]}},
> {offender,
> [{pid,<0.274.0>},
> {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
> {mfa,
> {gen_server,start_link,
> [couch_rep,
> ["f3e3081db5a215dbaf9b2984f0552090",
> {[{<<"target">>,
> <<"http://1.2.3.4:5984/node-metrics">>},
>
> {<<"source">>,<<"http://127.0.0.1:5984/node-metrics">>},
> {<<"continuous">>,true}]},
> {user_ctx,null,
> [<<"_admin">>],
> <<"{couch_httpd_auth,
> default_authentication_handler}">>}],
> []]}},
> {restart_type,temporary},
> {shutdown,1},
> {child_type,worker}]}]}}
> =SUPERVISOR REPORT==== 1-Apr-2010::14:25:26 ===
> Supervisor: {local,couch_rep_sup}
> Context: child_terminated
> Reason: {{badmatch,
> {stop,
> {db_not_found,
> <<"http://127.0.0.1:5984/node-metrics/">>}}},
> [{couch_rep,do_checkpoint,1},
> {couch_rep,handle_cast,2},
> {gen_server,handle_msg,5},
> {proc_lib,init_p_do_apply,3}]}
> Offender: [{pid,<0.274.0>},
> {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
> {mfa,
> {gen_server,start_link,
> [couch_rep,
> ["f3e3081db5a215dbaf9b2984f0552090",
> {[{<<"target">>,
> <<"http://1.2.3.4:5984/node-metrics">>},
> {<<"source">>,
> <<"http://127.0.0.1:5984/node-metrics">>},
> {<<"continuous">>,true}]},
> {user_ctx,null,
> [<<"_admin">>],
> <<"{couch_httpd_auth,
> default_authentication_handler}">>}],
> []]}},
> {restart_type,temporary},
> {shutdown,1},
> {child_type,worker}]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.