[jira] [Commented] (COUCHDB-3009) Cluster node databases unreadable when first node in cluster is down

Robert Newson (JIRA) Sun, 08 May 2016 16:30:23 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275790#comment-15275790
 ]


Robert Newson commented on COUCHDB-3009:
----------------------------------------

Please show the results for curl localhost:5984/_membership for each node (at 
each stage), and curl localhost:5984/$dbname/_shards from just after db 
creation.

My suspicion is that the cluster was not connected when the database was 
created, and thus was not present at all on the remaining nodes, but the above 
data will shed light.

I'm 100% sure that the original code from Cloudant behaves appropriately. Most 
likely this is a setup issue.

> Cluster node databases unreadable when first node in cluster is down
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-3009
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3009
>             Project: CouchDB
>          Issue Type: Bug
>          Components: BigCouch, Database Core
>    Affects Versions: 2.0.0
>            Reporter: Jason Gordon
>
> After creating 3 nodes in a cluster.  If the first node is taken down, the 
> other two nodes' default databases (_global_changes,_metadata, _replicator, 
> _users ) become unreadable with the error 500 
> {"error":"nodedown","reason":"progress not possible"}.
> Bringing up the first node, restores access.  However if the first node is 
> down, restarting nodes 2 and 3 does not restore access and also causes the 
> user databases to become unreachable.
> Note, only the first node created in the cluster causes this problem.  As 
> long as node 1 is up, nodes 2 and 3 can go up and down without having an 
> issue.
> Log messages seen on nodes 2 and 3:
> 15:23:46.388 [notice] cassim_metadata_cache changes listener died 
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:46.388 [error] Error in process <0.27407.0> on node 
> '[email protected]' with exit value:
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:46.389 [notice] chttpd_auth_cache changes listener died 
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:46.389 [error] Error in process <0.27414.0> on node 
> '[email protected]' with exit value:
> {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}
> 15:23:51.391 [error] gen_server chttpd_auth_cache terminated with reason: no 
> case clause matching {error,read_failure} in 
> chttpd_auth_cache:ensure_auth_ddoc_exists/2 line 187
> 15:23:51.391 [error] CRASH REPORT Process chttpd_auth_cache with 1 neighbours 
> exited with reason: no case clause matching {error,read_failure} in 
> chttpd_auth_cache:ensure_auth_ddoc_exists/2 line 187 in 
> gen_server:terminate/7 line 826
> 15:23:51.391 [error] Supervisor chttpd_sup had child undefined started with 
> chttpd_auth_cache:start_link() at <0.27413.0> exit with reason no case clause 
> matching {error,read_failure} in chttpd_auth_cache:ensure_auth_ddoc_exists/2 
> line 187 in context child_terminated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (COUCHDB-3009) Cluster node databases unreadable when first node in cluster is down

Reply via email to