Jason Gordon created COUCHDB-3009:
-------------------------------------

             Summary: Cluster node databases unreadable when first node in 
cluster is down
                 Key: COUCHDB-3009
                 URL: https://issues.apache.org/jira/browse/COUCHDB-3009
             Project: CouchDB
          Issue Type: Bug
          Components: BigCouch, Database Core
            Reporter: Jason Gordon


After creating 3 nodes in a cluster.  If the first node is taken down, the 
other two nodes' default databases (_global_changes,_metadata, _replicator, 
_users ) become unreadable with the error 500 
{"error":"nodedown","reason":"progress not possible"}.

Bringing up the first node, restores access.  However if the first node is 
down, restarting nodes 2 and 3 does not restore access and also causes the user 
databases to become unreachable.

Note, only the first node created in the cluster causes this problem.  As long 
as node 1 is up, nodes 2 and 3 can go up and down without having an issue.

Log messages seen on nodes 2 and 3:

15:23:46.388 [notice] cassim_metadata_cache changes listener died 
{{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
15:23:46.388 [error] Error in process <0.27407.0> on node 
'[email protected]' with exit value:
{{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}

15:23:46.389 [notice] chttpd_auth_cache changes listener died 
{{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
15:23:46.389 [error] Error in process <0.27414.0> on node 
'[email protected]' with exit value:
{{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}

15:23:51.391 [error] gen_server chttpd_auth_cache terminated with reason: no 
case clause matching {error,read_failure} in 
chttpd_auth_cache:ensure_auth_ddoc_exists/2 line 187
15:23:51.391 [error] CRASH REPORT Process chttpd_auth_cache with 1 neighbours 
exited with reason: no case clause matching {error,read_failure} in 
chttpd_auth_cache:ensure_auth_ddoc_exists/2 line 187 in gen_server:terminate/7 
line 826
15:23:51.391 [error] Supervisor chttpd_sup had child undefined started with 
chttpd_auth_cache:start_link() at <0.27413.0> exit with reason no case clause 
matching {error,read_failure} in chttpd_auth_cache:ensure_auth_ddoc_exists/2 
line 187 in context child_terminated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to