[
https://issues.apache.org/jira/browse/COUCHDB-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271334#comment-15271334
]
Jason Gordon commented on COUCHDB-3009:
---------------------------------------
Installed on CentOS7 using Erlang 18 and CouchDB apache-couchdb-2.0.0-9f4103f
from April 18th.
We tried 3 scenarios, two had this issue and one did not.
Scenaro 1 (no issues). Couch 3 node cluster on single machine. Started three
nodes with dev/run
Scenario 2 (database unreadable when first node down). Couch 2 node cluster on
single machine. Started with dev/run -n 2. Stop the first node and on the
second, the database is not available
Scenario 3 (database unreadable when first node down) Couch 3 node cluster on
two different machines. Edited configuration to use external IPs instead of
127.0.0.1. Started with dev/run -n 2 on one machine and dev/run -n 1 on second
machine. DBs accross cluster become unavailable only when the first node on
first machine is brought down (init:stop())
Please let me know if I can provide any further info.
Thanks!
Jason
> Cluster node databases unreadable when first node in cluster is down
> --------------------------------------------------------------------
>
> Key: COUCHDB-3009
> URL: https://issues.apache.org/jira/browse/COUCHDB-3009
> Project: CouchDB
> Issue Type: Bug
> Components: BigCouch, Database Core
> Affects Versions: 2.0.0
> Reporter: Jason Gordon
>
> After creating 3 nodes in a cluster. If the first node is taken down, the
> other two nodes' default databases (_global_changes,_metadata, _replicator,
> _users ) become unreadable with the error 500
> {"error":"nodedown","reason":"progress not possible"}.
> Bringing up the first node, restores access. However if the first node is
> down, restarting nodes 2 and 3 does not restore access and also causes the
> user databases to become unreachable.
> Note, only the first node created in the cluster causes this problem. As
> long as node 1 is up, nodes 2 and 3 can go up and down without having an
> issue.
> Log messages seen on nodes 2 and 3:
> 15:23:46.388 [notice] cassim_metadata_cache changes listener died
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:46.388 [error] Error in process <0.27407.0> on node
> '[email protected]' with exit value:
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:46.389 [notice] chttpd_auth_cache changes listener died
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:46.389 [error] Error in process <0.27414.0> on node
> '[email protected]' with exit value:
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:51.391 [error] gen_server chttpd_auth_cache terminated with reason: no
> case clause matching {error,read_failure} in
> chttpd_auth_cache:ensure_auth_ddoc_exists/2 line 187
> 15:23:51.391 [error] CRASH REPORT Process chttpd_auth_cache with 1 neighbours
> exited with reason: no case clause matching {error,read_failure} in
> chttpd_auth_cache:ensure_auth_ddoc_exists/2 line 187 in
> gen_server:terminate/7 line 826
> 15:23:51.391 [error] Supervisor chttpd_sup had child undefined started with
> chttpd_auth_cache:start_link() at <0.27413.0> exit with reason no case clause
> matching {error,read_failure} in chttpd_auth_cache:ensure_auth_ddoc_exists/2
> line 187 in context child_terminated
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)