> On 26 Oct 2014, at 22:11 , Alexander Shorin <[email protected]> wrote:
>
> On Sun, Oct 26, 2014 at 11:25 PM, Jan Lehnardt <[email protected]> wrote:
>> Definitely, sorry for missing that bit.
>>
>
> No worries. Let's clear this up (:
>
>>> - If node already has admin-party fixed should it accepts new admin
>>> credentials?
>>
>> Good question, I’d say if an admin already exists, no new admin credentials
>> are needed/
>>
>>
>>> - Any reasons to replace 1-3 PUT requests to /_config with single POST
>>> one in this case?
>>
>> I’m not sure what the 1-3 PUT requests are?
>
> These ones:
> curl -XPUT http://localhost:5984/_config/admin/root -d '"password"' -H
> 'Content-Type: application/json'
> curl -XPUT http://localhost:5984/_config/httpd/bind_address -d
> '"0.0.0.0"' -H 'Content-Type: application/json'
> curl -XPUT http://localhost:5984/_config/httpd/port -d '"5984"' -H
> 'Content-Type: application/json'
>
> The last two are optional as like as the related fields are optional
> for /_setup call.
Thank, got it. If it is only these settings, then the one /_setup call
with action enable_cluster could be replaced by this. I think having
a deliberate /_setup that duplicates some /_config stuff is actually
helpful.
Incidentally, how does /_config behave in a cluster? Does it write
back to all nodes’s local.ini file?
>
>>> 3. Pick any one node, for simplicity use the first one, to be the
>>>> “setup coordination node”.
>>>> - this is a “master” node that manages the setup and requires all
>>>> other nodes to be able to see it and vice versa. Setup won’t work
>>>> with unavailable nodes (duh). The notion of “master” will be gone
>>>> once the setup is finished. At that point, the system has no
>>>> master node. Ignore I ever said “master”.
>>>>
>>>> a. Go to Fauxton / Cluster Setup, once we have enabled the cluster, the
>>>> UI shows an “Add Node” interface with the fields admin, and node:
>>>> - POST to /_setup with
>>>> {
>>>> "action": "add_node",
>>>> "admin": { // should be auto-filled from Fauxton
>>>> "user": "username",
>>>> "pass": "password"
>>>> },
>>>> "node": {
>>>> "host": "hostname",
>>>> ["port": 5984]
>>>> }
>>>> }
>>>>
>>>> b. as in a, but without the Fauxton bits, just POST to /_setup
>>>> - this request will do this:
>>>> - on the “setup coordination node”:
>>>> - check if we have an Erlang Cookie Secret. If not, generate
>>>> a UUID and set the erlang cookie to to that UUID.
>>>> // TBD: persist the cookie, so it survives restarts
>>>> - make a POST request to the node specified in the body above
>>>> using the admin credentials in the body above:
>>>> POST to http://username:password@node_b:5984/_setup with:
>>>> {
>>>> "action": "receive_cookie",
>>>> "cookie": "<secretcookie>",
>>>> }
>>>> // TBD: persist the cookie on node B, so it survives restarts
>>>>
>>>> - when the request to node B returns, we know the Erlang-level
>>>> inter-cluster communication is enabled and we can start adding
>>>> the node on the CouchDB level. To do that, the “setup
>>>> coordination node” does this to it’s own HTTP endpoint:
>>>> PUT /nodes/node_b:5984 or the same thing with internal APIs.
>>>>
>>>> - Repeat for all nodes.
>>>> - Fauxton keeps a list of all set up nodes for users to see.
>>>
>>> Question:
>>> - Since Fauxton already known all the nodes admin credentials and all
>>> the nodes are bounded to 0.0.0.0 iface (from previous step), will
>>> Fauxton automate nodes join into the cluster? This is about to skip
>>> "Repeat on all nodes" step
>>
>> How does Fauxton know about the other nodes at this point?
>> (I guess since the Erlang cluster is already set up, it could expose that
>> info to Fauxton in a zeroconf kind of fashion and auto-populate the Fauxton
>> UI with nodes that then can be joined with just a click of a button.)
>
> Oh, right. The "should be auto-filled from Fauxton" comment confused
> me, so I thought about that Fauxton is already aware about nodes list.
> However, zeroconf is desirable, but this is another feature to add. So
> everything is ok here.
I don’t mean zeroconf specifically, but your comment got me a new idea,
about using Erlang cluster functions to auto-detect nodes, but now that
I think about it, I don’t think it works like I want, so let’s shelve
that part. We can opt into proper zeroconf anytime later.
>
>>> - If some of my nodes have different admin credentials, is this the
>>> blocker error case or should Fauxton ask me for these credentials?
>>
>> That’s why `add_node` takes a username and password as options, you
>> can set that up if you want. / This could also be made an error case.
>> It should certainly not be recommended.
>>
>
> Right, same confusion by "auto-filled" commentary (:
>
>>> - Any reasons for replacing regular request to /_nodes with custom
>>> /_setup?
>>
>> I don’t know what /_nodes is. Do you mean /nodes? — The reason this isn’t
>> using /nodes at this point is that /nodes already has a special meaning
>> and I didn’t want to complicate the existing logic. In addition, /nodes
>> might have to be adjusted to carry the username and password of the target
>> CouchDB to do the setup (if we otherwise keep the proposed model, happy
>> to see alternatives, though!).
>>
>> If we can reduce all of what I outlined to `PUT /nodes/node_b|c|d`, that
>> would be nice. Fauxton could then offer the setup UI based on whether /nodes
>> has any entries. But I don’t know enough about the semantics and other
>> uses of /nodes, so I haven’t thought about this option too much.
>>
>
> *Bikeshedding alert*: shouldn't system database names be started with
> leading underscore?(:
> Yes, /nodes. Btw, nice idea about storing there node credentials -
> this should help with cluster management in case when admin
> credentials are different everywhere. I only worry that this would
> cause a conflict with cassim logic.
/nodes is what BigCouch uses and I quickly conferred with rnewson on
IRC. /nodes is only available on :5986, e.g the per-node administration
port. It is not part of the public API on :5984, so I think we can skip
the underscoring there for now. (Incidentally, that is why we can’t do
the setup just from Fauxton, because we need to write to :5986 from
:5984.
>>> Point about cookie counts.
>>
>> Not sure I follow.
>
> I'd tried to overcome your reply. I could be wrong, but /nodes doesn't
> knows anything about Erlang cookies and how to work with them while
> your /_setup provides such functionality. I eventually trying to find
> reasons to avoid having special HTTP resource which will used only
> once for whole cluster lifespan while there exists other which are
> able to made the same job. Setting up cookies makes a sense to have it
> instead.
Yeah, that’s exactly my point, I think if we can make /nodes (that I now
learn is node-only (see above)) understand Erlang cookie business, we
could avoid /_setup, but since /nodes is only available on :5986 and we
are trying new users to never have to see anything but :5984, we need
/_setup as a “proxy”. In addition, /nodes only “works” after the Erlang
cookie is set up in all nodes, so we are in a chicken and egg situation
here, and I think that’s the final thing that requires us to use /_setup.
> 4.a. When all nodes are added, click the [Finish Cluster Setup] button
>>>> in Fauxton.
>>>> - this does POST /_setup
>>>> {
>>>> "action": "finish_setup"
>>>> }
>>>>
>>>> b. Same as in a.
>>>>
>>>> - this manages the final setup bits, like creating the _users,
>>>> _replicator and _db_updates endpoints and whatever else is needed.
>>>> // TBD: collect what else is needed.
>>>
>>> This is the only useful thing that /_setup does from my current point
>>> of view - everything else was just masking standard requests to
>>> existed API.
>>
>> Which existing API in particular?
>>
>> If you mean that this all can be done over /_config and /nodes, yes totally,
>> but Fauxton on node_a can’t access /_config on node_b. That’s one of the
>> reasons of why I suggest using /_setup, so it can do all this from a single
>> node via Fauxton. The other reason is that it is a dedicated API end-point
>> that hides a lot of complexity instead of having end-users hit a bunch of
>> seemingly random endpoints (although this *could* be hidden in Fauxton maybe,
>> except for the cross domain issue).
>
> Yes, I'm about /_config and /nodes. But why Fauxton cannot access to
> config on node_b? Especially, if it knows the credentials and node_b
> bounded to 0.0.0.0 iface.
Same origin policy in browsers, see above :)
> About API usage complexity: for followers of Fauxton-driven way they
> really don't care about what HTTP requests will be made behind while
> nice spinner loops in their browser. As for fellows of console way
> this isn't an issue too: some small cluster installations are easily
> to made via "seemingly random endpoints" following our guidelines; for
> bigger clusters these processes tends to be automated by provisioning
> tools.
Yeah, I can get behind the reasoning that complexity can be hidden
behind Fauxton and cli setup can be a bit more complex. I just like
the idea of making this a first-class setup citizen :)
>>>> ## The Setup Endpoint
>>>>
>>>> This is not a REST-y endpoint, it is a simple state machine operated
>>>> by HTTP POST with JSON bodies that have an `action` field.
>>>>
>>>> ### State 1: No Cluster Enabled
>>>>
>>>> This is right after starting a node for the first time, and any time
>>>> before the cluster is enabled as outlined above.
>>>>
>>>> GET /_setup
>>>> {"state": "cluster_disabled"}
>>>>
>>>> POST /_setup {"action":"enable_cluster"...} -> Transition to State 2
>>>> POST /_setup {"action":"enable_cluster"...} with empty admin user/pass or
>>>> invalid host/post or host/port not available -> Error
>>>> POST /_setup {"action":"anything_but_enable_cluster"...} -> Error
>>>>
>>>
>>> If "enable_cluster" only creates/setups admin and bind address, could
>>> this step be skipped? Because the same actions are possible to do via
>>> regular config setup.
>>
>> Yes! It just needs to ensure these things are done. If Fauxton detects
>> they *are* done, it can skip the enable step and show the add_node interface
>> right away.
>
> Good!
>
>>>
>>>
>>>> ### State 2: Cluster enabled, admin user set, waiting for nodes to be
>>>> added.
>>>>
>>>> GET /_setup
>>>> {"state":"cluster_enabled","nodes":[]}
>>>>
>>>> POST /_setup {"action":"enable_cluster"...} -> Error
>>>> POST /_setup {"action":"add_node"...} -> Stay in State 2, but return
>>>> "nodes":["node B"}] on GET
>>>> POST /_setup {"action":"add_node"...} -> if target node not available,
>>>> Error
>>>> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error
>>>> POST /_setup {"action":"finish_cluster"} -> Transition to State 3
>>>>
>>>
>>> Questions:
>>> - How much nodes required to be added? 1? 2? 3?...
>>
>> Doesn’t matter.
>
> Then a case:
> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error
>
> will never happens since there will be always at least one done in
> cluster - those one who runs setup (:
From Fauxton, yes, but an API user could just call enable_cluster and
then finish_cluster, and they should get an appropriate error :)
>>> - How to remove accidentally added node from cluster?
>>
>> Delete from /nodes database. Could be added as a UI element in Fauxton.
>
> That's what I worried about: we adding nodes via /_setup, but have to
> remove them via /nodes. Consistency have to be preserved (:
Now that I know about /nodes being on :5986 only, /_setup needs a
remove_node action as well. Thanks for flagging that :)
>
>
>>>
>>>> ### State 3: Cluster set up, all nodes operational
>>>>
>>>> GET /_setup
>>>> {"state":"cluster_finished","nodes":["node a", "node b", ...]}
>>>>
>>>> POST /_setup {"action":"enable_cluster"...} -> Error
>>>> POST /_setup {"action":"finish_cluster"...} -> Stay in State 3, do nothing
>>>> POST /_setup {"action":"add_node"...} -> Error
>>>> POST /_setup?i_know_what_i_am_doing=true {"action":"add_node"...} -> Add
>>>> node, stay in State 3.
>>>>
>>>> // TBD: we need to persist the setup state somewhere.
>>>>
>>>
>>> Questions:
>>> - Why adding a new node after finish_cluster is some specific case to
>>> mark it with "i_know_what_i_am_doing" parameter?
>>
>> Because I think it is not advisable to do this regularly, but someone might
>> want to do this regardless (see next).
>>
>>
>>> - How to enlarge / reduce cluster after his setup or even disband it?
>>
>> Enlarge: see above.
>> Reduce: delete from /nodes
>> Disband: shut down all CouchDB processes :)
>>
>> I don’t know the BigCouch/Cloudant best practices for this. I’ll chalk
>> this down as a “needs input from Cloudant people” :)
>>
>
> You run me into recursion with (see next) and (see above) notes! Nice
> trick, but still unclear how to let your cluster grow - this isn't
> some exceptional case. Reducing (not eventual during network issues)
> is what more rarely could happens. +1 for having more info from
> Cloudant people (:
>
>
>>> Or this isn't what /_setup should cares about?
>>
>> In general, this isn’t really covered by the setup proposal here. I’d like
>> to keep this out of scope for now, but we should have good answers to that
>> going forward.
>
> Agreed.
>
>>> - What happens with /_setup resource after finish_cluster? Any case
>>> for it to be useful?
>>
>> Only for Fauxton to show the correct setup state.
>>
>
> If so then I just figured out some better name for it: /_cluster
> - it setups the cluster as you planned
> - it shows cluster state as you planned
> - it allows to manage cluster nodes in the way which isn't suitable
> for /nodes API (like setting cookies)
> - it becomes useful after cluster setup
> and it could handle other cluster-wide tasks.
>
> What do you think about?
I’m not attached to any particular name /_cluster or /_cluster_setup
work for me :)
>>> - How could /_setup helps with admin password change among the all
>>> cluster nodes?
>>
>> At least on the first run setup, Fauxton can just keep the new password in
>> memory and pre-fill the add_node screens with the same username and password.
>> /_setup then transports it over.
>>
>> For later setups, I don’t know, as we would have the admin to enter the
>> password
>> in plaintext so we can send it. Alternatively, we could use the /_config API
>> to
>> read and send the PBKDF2 hash *waves hands*.
>>
>
> "Fauxton can just keep the new password in memory" opens a door to the
> issue when you accidentally refreshes page / closes tab / loses page
> memory in other way. Not an flaw, just a case to remember about.
>
> As about sending (or replicating) PBKDF2 hash looks good for me.
Yeah, that sounds like a more robust solution. We’d need something
simliar for the HTTP auth secret.
>>> - If I add a new node after "finish_cluster" setup, will it have all
>>> system databases (global_changes, cassim, _users...whatever else)
>>> created?
>>
>> That is unspecified at this point. I’d need more input from the Cloudant
>> people
>> on this one. I’m happy to go either way, or make it an option for later
>> joined
>> nodes.
>
> Ok. Let's wait what Cloudant people say. I'm pretty sure they already
> know the solution for all these problems or at least knows their
> specifics.
>
> Thanks a lot, Jan! (:
Thank you Alex, this helps a lot! :)
* * *
I’ll start putting this proposal into the wiki, so we can see it evolve from
now on.
* * *
Best
Jan
--