Great news, Jan! -- ,,,^..^,,,
On Fri, Nov 7, 2014 at 5:49 PM, Jan Lehnardt <[email protected]> wrote: > Hey all, > > I’ve spent some time coding up the /_cluster_setup endpoint and the basic > happy case already works, yay :) > > You can follow along here > https://git-wip-us.apache.org/repos/asf?p=couchdb-setup.git;a=summary or here > https://github.com/janl/couchdb-setup > > Any feedback welcome. > > Next steps are: > - collecting feedback > - test all error conditions > - solicit help from the Fauxton team to build the frontend bits of this > *puppyeyes* <3 > > Woop. > > Best > Jan > -- > > > > > > >> On 31 Oct 2014, at 17:31 , Jan Lehnardt <[email protected]> wrote: >> >>> >>> On 31 Oct 2014, at 14:33 , Jan Lehnardt <[email protected]> wrote: >>> >>>> >>>> On 26 Oct 2014, at 22:11 , Alexander Shorin <[email protected]> wrote: >>>> >>>> On Sun, Oct 26, 2014 at 11:25 PM, Jan Lehnardt <[email protected]> wrote: >>>>> Definitely, sorry for missing that bit. >>>>> >>>> >>>> No worries. Let's clear this up (: >>>> >>>>>> - If node already has admin-party fixed should it accepts new admin >>>>>> credentials? >>>>> >>>>> Good question, I’d say if an admin already exists, no new admin >>>>> credentials are needed/ >>>>> >>>>> >>>>>> - Any reasons to replace 1-3 PUT requests to /_config with single POST >>>>>> one in this case? >>>>> >>>>> I’m not sure what the 1-3 PUT requests are? >>>> >>>> These ones: >>>> curl -XPUT http://localhost:5984/_config/admin/root -d '"password"' -H >>>> 'Content-Type: application/json' >>>> curl -XPUT http://localhost:5984/_config/httpd/bind_address -d >>>> '"0.0.0.0"' -H 'Content-Type: application/json' >>>> curl -XPUT http://localhost:5984/_config/httpd/port -d '"5984"' -H >>>> 'Content-Type: application/json' >>>> >>>> The last two are optional as like as the related fields are optional >>>> for /_setup call. >>> >>> Thank, got it. If it is only these settings, then the one /_setup call >>> with action enable_cluster could be replaced by this. I think having >>> a deliberate /_setup that duplicates some /_config stuff is actually >>> helpful. >>> >>> Incidentally, how does /_config behave in a cluster? Does it write >>> back to all nodes’s local.ini file? >>> >>> >>>> >>>>>> 3. Pick any one node, for simplicity use the first one, to be the >>>>>>> “setup coordination node”. >>>>>>> - this is a “master” node that manages the setup and requires all >>>>>>> other nodes to be able to see it and vice versa. Setup won’t work >>>>>>> with unavailable nodes (duh). The notion of “master” will be gone >>>>>>> once the setup is finished. At that point, the system has no >>>>>>> master node. Ignore I ever said “master”. >>>>>>> >>>>>>> a. Go to Fauxton / Cluster Setup, once we have enabled the cluster, the >>>>>>> UI shows an “Add Node” interface with the fields admin, and node: >>>>>>> - POST to /_setup with >>>>>>> { >>>>>>> "action": "add_node", >>>>>>> "admin": { // should be auto-filled from Fauxton >>>>>>> "user": "username", >>>>>>> "pass": "password" >>>>>>> }, >>>>>>> "node": { >>>>>>> "host": "hostname", >>>>>>> ["port": 5984] >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> b. as in a, but without the Fauxton bits, just POST to /_setup >>>>>>> - this request will do this: >>>>>>> - on the “setup coordination node”: >>>>>>> - check if we have an Erlang Cookie Secret. If not, generate >>>>>>> a UUID and set the erlang cookie to to that UUID. >>>>>>> // TBD: persist the cookie, so it survives restarts >>>>>>> - make a POST request to the node specified in the body above >>>>>>> using the admin credentials in the body above: >>>>>>> POST to http://username:password@node_b:5984/_setup with: >>>>>>> { >>>>>>> "action": "receive_cookie", >>>>>>> "cookie": "<secretcookie>", >>>>>>> } >>>>>>> // TBD: persist the cookie on node B, so it survives restarts >>>>>>> >>>>>>> - when the request to node B returns, we know the Erlang-level >>>>>>> inter-cluster communication is enabled and we can start adding >>>>>>> the node on the CouchDB level. To do that, the “setup >>>>>>> coordination node” does this to it’s own HTTP endpoint: >>>>>>> PUT /nodes/node_b:5984 or the same thing with internal APIs. >>>>>>> >>>>>>> - Repeat for all nodes. >>>>>>> - Fauxton keeps a list of all set up nodes for users to see. >>>>>> >>>>>> Question: >>>>>> - Since Fauxton already known all the nodes admin credentials and all >>>>>> the nodes are bounded to 0.0.0.0 iface (from previous step), will >>>>>> Fauxton automate nodes join into the cluster? This is about to skip >>>>>> "Repeat on all nodes" step >>>>> >>>>> How does Fauxton know about the other nodes at this point? >>>>> (I guess since the Erlang cluster is already set up, it could expose that >>>>> info to Fauxton in a zeroconf kind of fashion and auto-populate the >>>>> Fauxton >>>>> UI with nodes that then can be joined with just a click of a button.) >>>> >>>> Oh, right. The "should be auto-filled from Fauxton" comment confused >>>> me, so I thought about that Fauxton is already aware about nodes list. >>>> However, zeroconf is desirable, but this is another feature to add. So >>>> everything is ok here. >>> >>> I don’t mean zeroconf specifically, but your comment got me a new idea, >>> about using Erlang cluster functions to auto-detect nodes, but now that >>> I think about it, I don’t think it works like I want, so let’s shelve >>> that part. We can opt into proper zeroconf anytime later. >>> >>>> >>>>>> - If some of my nodes have different admin credentials, is this the >>>>>> blocker error case or should Fauxton ask me for these credentials? >>>>> >>>>> That’s why `add_node` takes a username and password as options, you >>>>> can set that up if you want. / This could also be made an error case. >>>>> It should certainly not be recommended. >>>>> >>>> >>>> Right, same confusion by "auto-filled" commentary (: >>>> >>>>>> - Any reasons for replacing regular request to /_nodes with custom >>>>>> /_setup? >>>>> >>>>> I don’t know what /_nodes is. Do you mean /nodes? — The reason this isn’t >>>>> using /nodes at this point is that /nodes already has a special meaning >>>>> and I didn’t want to complicate the existing logic. In addition, /nodes >>>>> might have to be adjusted to carry the username and password of the target >>>>> CouchDB to do the setup (if we otherwise keep the proposed model, happy >>>>> to see alternatives, though!). >>>>> >>>>> If we can reduce all of what I outlined to `PUT /nodes/node_b|c|d`, that >>>>> would be nice. Fauxton could then offer the setup UI based on whether >>>>> /nodes >>>>> has any entries. But I don’t know enough about the semantics and other >>>>> uses of /nodes, so I haven’t thought about this option too much. >>>>> >>>> >>>> *Bikeshedding alert*: shouldn't system database names be started with >>>> leading underscore?(: >>>> Yes, /nodes. Btw, nice idea about storing there node credentials - >>>> this should help with cluster management in case when admin >>>> credentials are different everywhere. I only worry that this would >>>> cause a conflict with cassim logic. >>> >>> /nodes is what BigCouch uses and I quickly conferred with rnewson on >>> IRC. /nodes is only available on :5986, e.g the per-node administration >>> port. It is not part of the public API on :5984, so I think we can skip >>> the underscoring there for now. (Incidentally, that is why we can’t do >>> the setup just from Fauxton, because we need to write to :5986 from >>> :5984. >>> >>> >>>>>> Point about cookie counts. >>>>> >>>>> Not sure I follow. >>>> >>>> I'd tried to overcome your reply. I could be wrong, but /nodes doesn't >>>> knows anything about Erlang cookies and how to work with them while >>>> your /_setup provides such functionality. I eventually trying to find >>>> reasons to avoid having special HTTP resource which will used only >>>> once for whole cluster lifespan while there exists other which are >>>> able to made the same job. Setting up cookies makes a sense to have it >>>> instead. >>> >>> Yeah, that’s exactly my point, I think if we can make /nodes (that I now >>> learn is node-only (see above)) understand Erlang cookie business, we >>> could avoid /_setup, but since /nodes is only available on :5986 and we >>> are trying new users to never have to see anything but :5984, we need >>> /_setup as a “proxy”. In addition, /nodes only “works” after the Erlang >>> cookie is set up in all nodes, so we are in a chicken and egg situation >>> here, and I think that’s the final thing that requires us to use /_setup. >>> >>> >>>> 4.a. When all nodes are added, click the [Finish Cluster Setup] button >>>>>>> in Fauxton. >>>>>>> - this does POST /_setup >>>>>>> { >>>>>>> "action": "finish_setup" >>>>>>> } >>>>>>> >>>>>>> b. Same as in a. >>>>>>> >>>>>>> - this manages the final setup bits, like creating the _users, >>>>>>> _replicator and _db_updates endpoints and whatever else is needed. >>>>>>> // TBD: collect what else is needed. >>>>>> >>>>>> This is the only useful thing that /_setup does from my current point >>>>>> of view - everything else was just masking standard requests to >>>>>> existed API. >>>>> >>>>> Which existing API in particular? >>>>> >>>>> If you mean that this all can be done over /_config and /nodes, yes >>>>> totally, >>>>> but Fauxton on node_a can’t access /_config on node_b. That’s one of the >>>>> reasons of why I suggest using /_setup, so it can do all this from a >>>>> single >>>>> node via Fauxton. The other reason is that it is a dedicated API end-point >>>>> that hides a lot of complexity instead of having end-users hit a bunch of >>>>> seemingly random endpoints (although this *could* be hidden in Fauxton >>>>> maybe, >>>>> except for the cross domain issue). >>>> >>>> Yes, I'm about /_config and /nodes. But why Fauxton cannot access to >>>> config on node_b? Especially, if it knows the credentials and node_b >>>> bounded to 0.0.0.0 iface. >>> >>> Same origin policy in browsers, see above :) >>> >>> >>>> About API usage complexity: for followers of Fauxton-driven way they >>>> really don't care about what HTTP requests will be made behind while >>>> nice spinner loops in their browser. As for fellows of console way >>>> this isn't an issue too: some small cluster installations are easily >>>> to made via "seemingly random endpoints" following our guidelines; for >>>> bigger clusters these processes tends to be automated by provisioning >>>> tools. >>> >>> Yeah, I can get behind the reasoning that complexity can be hidden >>> behind Fauxton and cli setup can be a bit more complex. I just like >>> the idea of making this a first-class setup citizen :) >>> >>> >>>>>>> ## The Setup Endpoint >>>>>>> >>>>>>> This is not a REST-y endpoint, it is a simple state machine operated >>>>>>> by HTTP POST with JSON bodies that have an `action` field. >>>>>>> >>>>>>> ### State 1: No Cluster Enabled >>>>>>> >>>>>>> This is right after starting a node for the first time, and any time >>>>>>> before the cluster is enabled as outlined above. >>>>>>> >>>>>>> GET /_setup >>>>>>> {"state": "cluster_disabled"} >>>>>>> >>>>>>> POST /_setup {"action":"enable_cluster"...} -> Transition to State 2 >>>>>>> POST /_setup {"action":"enable_cluster"...} with empty admin user/pass >>>>>>> or invalid host/post or host/port not available -> Error >>>>>>> POST /_setup {"action":"anything_but_enable_cluster"...} -> Error >>>>>>> >>>>>> >>>>>> If "enable_cluster" only creates/setups admin and bind address, could >>>>>> this step be skipped? Because the same actions are possible to do via >>>>>> regular config setup. >>>>> >>>>> Yes! It just needs to ensure these things are done. If Fauxton detects >>>>> they *are* done, it can skip the enable step and show the add_node >>>>> interface >>>>> right away. >>>> >>>> Good! >>>> >>>>>> >>>>>> >>>>>>> ### State 2: Cluster enabled, admin user set, waiting for nodes to be >>>>>>> added. >>>>>>> >>>>>>> GET /_setup >>>>>>> {"state":"cluster_enabled","nodes":[]} >>>>>>> >>>>>>> POST /_setup {"action":"enable_cluster"...} -> Error >>>>>>> POST /_setup {"action":"add_node"...} -> Stay in State 2, but return >>>>>>> "nodes":["node B"}] on GET >>>>>>> POST /_setup {"action":"add_node"...} -> if target node not available, >>>>>>> Error >>>>>>> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error >>>>>>> POST /_setup {"action":"finish_cluster"} -> Transition to State 3 >>>>>>> >>>>>> >>>>>> Questions: >>>>>> - How much nodes required to be added? 1? 2? 3?... >>>>> >>>>> Doesn’t matter. >>>> >>>> Then a case: >>>> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error >>>> >>>> will never happens since there will be always at least one done in >>>> cluster - those one who runs setup (: >>> >>> From Fauxton, yes, but an API user could just call enable_cluster and >>> then finish_cluster, and they should get an appropriate error :) >>> >>> >>>>>> - How to remove accidentally added node from cluster? >>>>> >>>>> Delete from /nodes database. Could be added as a UI element in Fauxton. >>>> >>>> That's what I worried about: we adding nodes via /_setup, but have to >>>> remove them via /nodes. Consistency have to be preserved (: >>> >>> Now that I know about /nodes being on :5986 only, /_setup needs a >>> remove_node action as well. Thanks for flagging that :) >>> >>> >>> >>>> >>>> >>>>>> >>>>>>> ### State 3: Cluster set up, all nodes operational >>>>>>> >>>>>>> GET /_setup >>>>>>> {"state":"cluster_finished","nodes":["node a", "node b", ...]} >>>>>>> >>>>>>> POST /_setup {"action":"enable_cluster"...} -> Error >>>>>>> POST /_setup {"action":"finish_cluster"...} -> Stay in State 3, do >>>>>>> nothing >>>>>>> POST /_setup {"action":"add_node"...} -> Error >>>>>>> POST /_setup?i_know_what_i_am_doing=true {"action":"add_node"...} -> >>>>>>> Add node, stay in State 3. >>>>>>> >>>>>>> // TBD: we need to persist the setup state somewhere. >>>>>>> >>>>>> >>>>>> Questions: >>>>>> - Why adding a new node after finish_cluster is some specific case to >>>>>> mark it with "i_know_what_i_am_doing" parameter? >>>>> >>>>> Because I think it is not advisable to do this regularly, but someone >>>>> might >>>>> want to do this regardless (see next). >>>>> >>>>> >>>>>> - How to enlarge / reduce cluster after his setup or even disband it? >>>>> >>>>> Enlarge: see above. >>>>> Reduce: delete from /nodes >>>>> Disband: shut down all CouchDB processes :) >>>>> >>>>> I don’t know the BigCouch/Cloudant best practices for this. I’ll chalk >>>>> this down as a “needs input from Cloudant people” :) >>>>> >>>> >>>> You run me into recursion with (see next) and (see above) notes! Nice >>>> trick, but still unclear how to let your cluster grow - this isn't >>>> some exceptional case. Reducing (not eventual during network issues) >>>> is what more rarely could happens. +1 for having more info from >>>> Cloudant people (: >>>> >>>> >>>>>> Or this isn't what /_setup should cares about? >>>>> >>>>> In general, this isn’t really covered by the setup proposal here. I’d like >>>>> to keep this out of scope for now, but we should have good answers to that >>>>> going forward. >>>> >>>> Agreed. >>>> >>>>>> - What happens with /_setup resource after finish_cluster? Any case >>>>>> for it to be useful? >>>>> >>>>> Only for Fauxton to show the correct setup state. >>>>> >>>> >>>> If so then I just figured out some better name for it: /_cluster >>>> - it setups the cluster as you planned >>>> - it shows cluster state as you planned >>>> - it allows to manage cluster nodes in the way which isn't suitable >>>> for /nodes API (like setting cookies) >>>> - it becomes useful after cluster setup >>>> and it could handle other cluster-wide tasks. >>>> >>>> What do you think about? >>> >>> I’m not attached to any particular name /_cluster or /_cluster_setup >>> work for me :) >>> >>> >>>>>> - How could /_setup helps with admin password change among the all >>>>>> cluster nodes? >>>>> >>>>> At least on the first run setup, Fauxton can just keep the new password in >>>>> memory and pre-fill the add_node screens with the same username and >>>>> password. >>>>> /_setup then transports it over. >>>>> >>>>> For later setups, I don’t know, as we would have the admin to enter the >>>>> password >>>>> in plaintext so we can send it. Alternatively, we could use the /_config >>>>> API to >>>>> read and send the PBKDF2 hash *waves hands*. >>>>> >>>> >>>> "Fauxton can just keep the new password in memory" opens a door to the >>>> issue when you accidentally refreshes page / closes tab / loses page >>>> memory in other way. Not an flaw, just a case to remember about. >>>> >>>> As about sending (or replicating) PBKDF2 hash looks good for me. >>> >>> Yeah, that sounds like a more robust solution. We’d need something >>> simliar for the HTTP auth secret. >> >> Meh, catch-22 again. Since /_setup is admin-only and we have already set up >> the target node, we will need the cluster password in plain text added there. >> We could mitigate some of the problems you outline by storing the password >> in localStorage until we finsh_cluster (or a timeout, whichever occurs >> first). >> >>> >>> >>>>>> - If I add a new node after "finish_cluster" setup, will it have all >>>>>> system databases (global_changes, cassim, _users...whatever else) >>>>>> created? >>>>> >>>>> That is unspecified at this point. I’d need more input from the Cloudant >>>>> people >>>>> on this one. I’m happy to go either way, or make it an option for later >>>>> joined >>>>> nodes. >>>> >>>> Ok. Let's wait what Cloudant people say. I'm pretty sure they already >>>> know the solution for all these problems or at least knows their >>>> specifics. >>>> >>>> Thanks a lot, Jan! (: >>> >>> Thank you Alex, this helps a lot! :) >>> >>> * * * >>> >>> I’ll start putting this proposal into the wiki, so we can see it evolve from >>> now on. >>> >>> * * * >>> >>> Best >>> Jan >>> -- >
