> On 31 Oct 2014, at 14:33 , Jan Lehnardt <[email protected]> wrote: > >> >> On 26 Oct 2014, at 22:11 , Alexander Shorin <[email protected]> wrote: >> >> On Sun, Oct 26, 2014 at 11:25 PM, Jan Lehnardt <[email protected]> wrote: >>> Definitely, sorry for missing that bit. >>> >> >> No worries. Let's clear this up (: >> >>>> - If node already has admin-party fixed should it accepts new admin >>>> credentials? >>> >>> Good question, I’d say if an admin already exists, no new admin credentials >>> are needed/ >>> >>> >>>> - Any reasons to replace 1-3 PUT requests to /_config with single POST >>>> one in this case? >>> >>> I’m not sure what the 1-3 PUT requests are? >> >> These ones: >> curl -XPUT http://localhost:5984/_config/admin/root -d '"password"' -H >> 'Content-Type: application/json' >> curl -XPUT http://localhost:5984/_config/httpd/bind_address -d >> '"0.0.0.0"' -H 'Content-Type: application/json' >> curl -XPUT http://localhost:5984/_config/httpd/port -d '"5984"' -H >> 'Content-Type: application/json' >> >> The last two are optional as like as the related fields are optional >> for /_setup call. > > Thank, got it. If it is only these settings, then the one /_setup call > with action enable_cluster could be replaced by this. I think having > a deliberate /_setup that duplicates some /_config stuff is actually > helpful. > > Incidentally, how does /_config behave in a cluster? Does it write > back to all nodes’s local.ini file? > > >> >>>> 3. Pick any one node, for simplicity use the first one, to be the >>>>> “setup coordination node”. >>>>> - this is a “master” node that manages the setup and requires all >>>>> other nodes to be able to see it and vice versa. Setup won’t work >>>>> with unavailable nodes (duh). The notion of “master” will be gone >>>>> once the setup is finished. At that point, the system has no >>>>> master node. Ignore I ever said “master”. >>>>> >>>>> a. Go to Fauxton / Cluster Setup, once we have enabled the cluster, the >>>>> UI shows an “Add Node” interface with the fields admin, and node: >>>>> - POST to /_setup with >>>>> { >>>>> "action": "add_node", >>>>> "admin": { // should be auto-filled from Fauxton >>>>> "user": "username", >>>>> "pass": "password" >>>>> }, >>>>> "node": { >>>>> "host": "hostname", >>>>> ["port": 5984] >>>>> } >>>>> } >>>>> >>>>> b. as in a, but without the Fauxton bits, just POST to /_setup >>>>> - this request will do this: >>>>> - on the “setup coordination node”: >>>>> - check if we have an Erlang Cookie Secret. If not, generate >>>>> a UUID and set the erlang cookie to to that UUID. >>>>> // TBD: persist the cookie, so it survives restarts >>>>> - make a POST request to the node specified in the body above >>>>> using the admin credentials in the body above: >>>>> POST to http://username:password@node_b:5984/_setup with: >>>>> { >>>>> "action": "receive_cookie", >>>>> "cookie": "<secretcookie>", >>>>> } >>>>> // TBD: persist the cookie on node B, so it survives restarts >>>>> >>>>> - when the request to node B returns, we know the Erlang-level >>>>> inter-cluster communication is enabled and we can start adding >>>>> the node on the CouchDB level. To do that, the “setup >>>>> coordination node” does this to it’s own HTTP endpoint: >>>>> PUT /nodes/node_b:5984 or the same thing with internal APIs. >>>>> >>>>> - Repeat for all nodes. >>>>> - Fauxton keeps a list of all set up nodes for users to see. >>>> >>>> Question: >>>> - Since Fauxton already known all the nodes admin credentials and all >>>> the nodes are bounded to 0.0.0.0 iface (from previous step), will >>>> Fauxton automate nodes join into the cluster? This is about to skip >>>> "Repeat on all nodes" step >>> >>> How does Fauxton know about the other nodes at this point? >>> (I guess since the Erlang cluster is already set up, it could expose that >>> info to Fauxton in a zeroconf kind of fashion and auto-populate the Fauxton >>> UI with nodes that then can be joined with just a click of a button.) >> >> Oh, right. The "should be auto-filled from Fauxton" comment confused >> me, so I thought about that Fauxton is already aware about nodes list. >> However, zeroconf is desirable, but this is another feature to add. So >> everything is ok here. > > I don’t mean zeroconf specifically, but your comment got me a new idea, > about using Erlang cluster functions to auto-detect nodes, but now that > I think about it, I don’t think it works like I want, so let’s shelve > that part. We can opt into proper zeroconf anytime later. > >> >>>> - If some of my nodes have different admin credentials, is this the >>>> blocker error case or should Fauxton ask me for these credentials? >>> >>> That’s why `add_node` takes a username and password as options, you >>> can set that up if you want. / This could also be made an error case. >>> It should certainly not be recommended. >>> >> >> Right, same confusion by "auto-filled" commentary (: >> >>>> - Any reasons for replacing regular request to /_nodes with custom >>>> /_setup? >>> >>> I don’t know what /_nodes is. Do you mean /nodes? — The reason this isn’t >>> using /nodes at this point is that /nodes already has a special meaning >>> and I didn’t want to complicate the existing logic. In addition, /nodes >>> might have to be adjusted to carry the username and password of the target >>> CouchDB to do the setup (if we otherwise keep the proposed model, happy >>> to see alternatives, though!). >>> >>> If we can reduce all of what I outlined to `PUT /nodes/node_b|c|d`, that >>> would be nice. Fauxton could then offer the setup UI based on whether /nodes >>> has any entries. But I don’t know enough about the semantics and other >>> uses of /nodes, so I haven’t thought about this option too much. >>> >> >> *Bikeshedding alert*: shouldn't system database names be started with >> leading underscore?(: >> Yes, /nodes. Btw, nice idea about storing there node credentials - >> this should help with cluster management in case when admin >> credentials are different everywhere. I only worry that this would >> cause a conflict with cassim logic. > > /nodes is what BigCouch uses and I quickly conferred with rnewson on > IRC. /nodes is only available on :5986, e.g the per-node administration > port. It is not part of the public API on :5984, so I think we can skip > the underscoring there for now. (Incidentally, that is why we can’t do > the setup just from Fauxton, because we need to write to :5986 from > :5984. > > >>>> Point about cookie counts. >>> >>> Not sure I follow. >> >> I'd tried to overcome your reply. I could be wrong, but /nodes doesn't >> knows anything about Erlang cookies and how to work with them while >> your /_setup provides such functionality. I eventually trying to find >> reasons to avoid having special HTTP resource which will used only >> once for whole cluster lifespan while there exists other which are >> able to made the same job. Setting up cookies makes a sense to have it >> instead. > > Yeah, that’s exactly my point, I think if we can make /nodes (that I now > learn is node-only (see above)) understand Erlang cookie business, we > could avoid /_setup, but since /nodes is only available on :5986 and we > are trying new users to never have to see anything but :5984, we need > /_setup as a “proxy”. In addition, /nodes only “works” after the Erlang > cookie is set up in all nodes, so we are in a chicken and egg situation > here, and I think that’s the final thing that requires us to use /_setup. > > >> 4.a. When all nodes are added, click the [Finish Cluster Setup] button >>>>> in Fauxton. >>>>> - this does POST /_setup >>>>> { >>>>> "action": "finish_setup" >>>>> } >>>>> >>>>> b. Same as in a. >>>>> >>>>> - this manages the final setup bits, like creating the _users, >>>>> _replicator and _db_updates endpoints and whatever else is needed. >>>>> // TBD: collect what else is needed. >>>> >>>> This is the only useful thing that /_setup does from my current point >>>> of view - everything else was just masking standard requests to >>>> existed API. >>> >>> Which existing API in particular? >>> >>> If you mean that this all can be done over /_config and /nodes, yes totally, >>> but Fauxton on node_a can’t access /_config on node_b. That’s one of the >>> reasons of why I suggest using /_setup, so it can do all this from a single >>> node via Fauxton. The other reason is that it is a dedicated API end-point >>> that hides a lot of complexity instead of having end-users hit a bunch of >>> seemingly random endpoints (although this *could* be hidden in Fauxton >>> maybe, >>> except for the cross domain issue). >> >> Yes, I'm about /_config and /nodes. But why Fauxton cannot access to >> config on node_b? Especially, if it knows the credentials and node_b >> bounded to 0.0.0.0 iface. > > Same origin policy in browsers, see above :) > > >> About API usage complexity: for followers of Fauxton-driven way they >> really don't care about what HTTP requests will be made behind while >> nice spinner loops in their browser. As for fellows of console way >> this isn't an issue too: some small cluster installations are easily >> to made via "seemingly random endpoints" following our guidelines; for >> bigger clusters these processes tends to be automated by provisioning >> tools. > > Yeah, I can get behind the reasoning that complexity can be hidden > behind Fauxton and cli setup can be a bit more complex. I just like > the idea of making this a first-class setup citizen :) > > >>>>> ## The Setup Endpoint >>>>> >>>>> This is not a REST-y endpoint, it is a simple state machine operated >>>>> by HTTP POST with JSON bodies that have an `action` field. >>>>> >>>>> ### State 1: No Cluster Enabled >>>>> >>>>> This is right after starting a node for the first time, and any time >>>>> before the cluster is enabled as outlined above. >>>>> >>>>> GET /_setup >>>>> {"state": "cluster_disabled"} >>>>> >>>>> POST /_setup {"action":"enable_cluster"...} -> Transition to State 2 >>>>> POST /_setup {"action":"enable_cluster"...} with empty admin user/pass or >>>>> invalid host/post or host/port not available -> Error >>>>> POST /_setup {"action":"anything_but_enable_cluster"...} -> Error >>>>> >>>> >>>> If "enable_cluster" only creates/setups admin and bind address, could >>>> this step be skipped? Because the same actions are possible to do via >>>> regular config setup. >>> >>> Yes! It just needs to ensure these things are done. If Fauxton detects >>> they *are* done, it can skip the enable step and show the add_node interface >>> right away. >> >> Good! >> >>>> >>>> >>>>> ### State 2: Cluster enabled, admin user set, waiting for nodes to be >>>>> added. >>>>> >>>>> GET /_setup >>>>> {"state":"cluster_enabled","nodes":[]} >>>>> >>>>> POST /_setup {"action":"enable_cluster"...} -> Error >>>>> POST /_setup {"action":"add_node"...} -> Stay in State 2, but return >>>>> "nodes":["node B"}] on GET >>>>> POST /_setup {"action":"add_node"...} -> if target node not available, >>>>> Error >>>>> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error >>>>> POST /_setup {"action":"finish_cluster"} -> Transition to State 3 >>>>> >>>> >>>> Questions: >>>> - How much nodes required to be added? 1? 2? 3?... >>> >>> Doesn’t matter. >> >> Then a case: >> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error >> >> will never happens since there will be always at least one done in >> cluster - those one who runs setup (: > > From Fauxton, yes, but an API user could just call enable_cluster and > then finish_cluster, and they should get an appropriate error :) > > >>>> - How to remove accidentally added node from cluster? >>> >>> Delete from /nodes database. Could be added as a UI element in Fauxton. >> >> That's what I worried about: we adding nodes via /_setup, but have to >> remove them via /nodes. Consistency have to be preserved (: > > Now that I know about /nodes being on :5986 only, /_setup needs a > remove_node action as well. Thanks for flagging that :) > > > >> >> >>>> >>>>> ### State 3: Cluster set up, all nodes operational >>>>> >>>>> GET /_setup >>>>> {"state":"cluster_finished","nodes":["node a", "node b", ...]} >>>>> >>>>> POST /_setup {"action":"enable_cluster"...} -> Error >>>>> POST /_setup {"action":"finish_cluster"...} -> Stay in State 3, do nothing >>>>> POST /_setup {"action":"add_node"...} -> Error >>>>> POST /_setup?i_know_what_i_am_doing=true {"action":"add_node"...} -> Add >>>>> node, stay in State 3. >>>>> >>>>> // TBD: we need to persist the setup state somewhere. >>>>> >>>> >>>> Questions: >>>> - Why adding a new node after finish_cluster is some specific case to >>>> mark it with "i_know_what_i_am_doing" parameter? >>> >>> Because I think it is not advisable to do this regularly, but someone might >>> want to do this regardless (see next). >>> >>> >>>> - How to enlarge / reduce cluster after his setup or even disband it? >>> >>> Enlarge: see above. >>> Reduce: delete from /nodes >>> Disband: shut down all CouchDB processes :) >>> >>> I don’t know the BigCouch/Cloudant best practices for this. I’ll chalk >>> this down as a “needs input from Cloudant people” :) >>> >> >> You run me into recursion with (see next) and (see above) notes! Nice >> trick, but still unclear how to let your cluster grow - this isn't >> some exceptional case. Reducing (not eventual during network issues) >> is what more rarely could happens. +1 for having more info from >> Cloudant people (: >> >> >>>> Or this isn't what /_setup should cares about? >>> >>> In general, this isn’t really covered by the setup proposal here. I’d like >>> to keep this out of scope for now, but we should have good answers to that >>> going forward. >> >> Agreed. >> >>>> - What happens with /_setup resource after finish_cluster? Any case >>>> for it to be useful? >>> >>> Only for Fauxton to show the correct setup state. >>> >> >> If so then I just figured out some better name for it: /_cluster >> - it setups the cluster as you planned >> - it shows cluster state as you planned >> - it allows to manage cluster nodes in the way which isn't suitable >> for /nodes API (like setting cookies) >> - it becomes useful after cluster setup >> and it could handle other cluster-wide tasks. >> >> What do you think about? > > I’m not attached to any particular name /_cluster or /_cluster_setup > work for me :) > > >>>> - How could /_setup helps with admin password change among the all >>>> cluster nodes? >>> >>> At least on the first run setup, Fauxton can just keep the new password in >>> memory and pre-fill the add_node screens with the same username and >>> password. >>> /_setup then transports it over. >>> >>> For later setups, I don’t know, as we would have the admin to enter the >>> password >>> in plaintext so we can send it. Alternatively, we could use the /_config >>> API to >>> read and send the PBKDF2 hash *waves hands*. >>> >> >> "Fauxton can just keep the new password in memory" opens a door to the >> issue when you accidentally refreshes page / closes tab / loses page >> memory in other way. Not an flaw, just a case to remember about. >> >> As about sending (or replicating) PBKDF2 hash looks good for me. > > Yeah, that sounds like a more robust solution. We’d need something > simliar for the HTTP auth secret.
Meh, catch-22 again. Since /_setup is admin-only and we have already set up the target node, we will need the cluster password in plain text added there. We could mitigate some of the problems you outline by storing the password in localStorage until we finsh_cluster (or a timeout, whichever occurs first). > > >>>> - If I add a new node after "finish_cluster" setup, will it have all >>>> system databases (global_changes, cassim, _users...whatever else) >>>> created? >>> >>> That is unspecified at this point. I’d need more input from the Cloudant >>> people >>> on this one. I’m happy to go either way, or make it an option for later >>> joined >>> nodes. >> >> Ok. Let's wait what Cloudant people say. I'm pretty sure they already >> know the solution for all these problems or at least knows their >> specifics. >> >> Thanks a lot, Jan! (: > > Thank you Alex, this helps a lot! :) > > * * * > > I’ll start putting this proposal into the wiki, so we can see it evolve from > now on. > > * * * > > Best > Jan > --
