Sean Lang created COUCHDB-3320:
----------------------------------
Summary: confusion with adding node to existing cluster
Key: COUCHDB-3320
URL: https://issues.apache.org/jira/browse/COUCHDB-3320
Project: CouchDB
Issue Type: Question
Reporter: Sean Lang
I've got a cluster running on 2 separate servers, and want to add a 3rd server.
I read the docs at http://docs.couchdb.org/en/2.0.0/cluster/nodes.html and at
the start my _membership endpoint looked like this on both nodes:
```
$ curl -X GET "http://0.0.0.0:5984/_membership" --user root
{"all_nodes":["[email protected]","[email protected]"],"cluster_nodes":["[email protected]","[email protected]"]}
```
And the server I'm trying to add is at `192.168.1.226`. Running `curl -X PUT
"http://0.0.0.0:5986/_nodes/[email protected]" -d {} --user root` from the
server at `192.168.1.214` didn't work. It showed `Connection attempt from
disallowed node '[email protected]'` and `Connection attempt from
disallowed node '[email protected]'` in the logs of `192.168.1.226`...
Which makes sense, because the command I ran didn't even provide the password
of `192.168.1.226`, so there's no reason why it should work.
I deleted the added node from the cluster with `curl -X DELETE
"http://0.0.0.0:5986/_nodes/[email protected]?rev=1-967a00dff5e02add41819138abb3284d"
-d {} --user root`. The Node Management docs don't actually mention that the
revision id is required and that the `_nodes` db operates like a normal
database, which seems to have confused at least [one
person](https://groups.google.com/d/msg/couchdb-user-archive/54tEryERBiI/O0GKBo_NBAAJ).
After reading through the dev cluster [setup
script](https://github.com/apache/couchdb/blob/master/dev/run#L422) I tried
running the following from the server at `192.168.1.214`:
```
$ curl -X POST -H "Content-Type: application/json"
"http://0.0.0.0:5984/_cluster_setup" -d
'{"action":"add_node","host":"192.168.1.226","port":5984,"username":"root","password":"XXXXXXX"}'
--user root
```
That almost worked. The membership for `192.168.1.214` was correct:
```
{"all_nodes":["[email protected]","[email protected]","[email protected]"],"cluster_nodes":["[email protected]","[email protected]","[email protected]"]}
```
But `192.168.1.226` showed that it wasn't talking to `192.168.1.202`
```
{"all_nodes":["[email protected]","[email protected]"],"cluster_nodes":["[email protected]","[email protected]","[email protected]"]}
```
Logs on `192.168.1.226` showed `global: '[email protected]' failed to
connect to '[email protected]'` and `Connection attempt from disallowed
node '[email protected]'`, but I don't understand why.
Rebooting `192.168.1.226`, deleting the entry from the `_nodes` database
again, and re-adding it with the exact same command run on `192.168.1.214` as
before seemed to work (all servers show full connectivity to each other).
However, I have no idea why it worked the second time, or if I'm doing
something horribly wrong.
Is this the correct way to add nodes to a cluster? Should the Node Management
docs be updated? I want to make sure I'm doing this right before I automate the
whole process with Kubernetes.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)