Great news, Jan!
--
,,,^..^,,,

On Fri, Nov 7, 2014 at 5:49 PM, Jan Lehnardt <[email protected]> wrote:
> Hey all,
>
> I’ve spent some time coding up the /_cluster_setup endpoint and the basic 
> happy case already works, yay :)
>
> You can follow along here 
> https://git-wip-us.apache.org/repos/asf?p=couchdb-setup.git;a=summary or here 
> https://github.com/janl/couchdb-setup
>
> Any feedback welcome.
>
> Next steps are:
> - collecting feedback
> - test all error conditions
> - solicit help from the Fauxton team to build the frontend bits of this 
> *puppyeyes* <3
>
> Woop.
>
> Best
> Jan
> --
>
>
>
>
>
>
>> On 31 Oct 2014, at 17:31 , Jan Lehnardt <[email protected]> wrote:
>>
>>>
>>> On 31 Oct 2014, at 14:33 , Jan Lehnardt <[email protected]> wrote:
>>>
>>>>
>>>> On 26 Oct 2014, at 22:11 , Alexander Shorin <[email protected]> wrote:
>>>>
>>>> On Sun, Oct 26, 2014 at 11:25 PM, Jan Lehnardt <[email protected]> wrote:
>>>>> Definitely, sorry for missing that bit.
>>>>>
>>>>
>>>> No worries. Let's clear this up (:
>>>>
>>>>>> - If node already has admin-party fixed should it accepts new admin 
>>>>>> credentials?
>>>>>
>>>>> Good question, I’d say if an admin already exists, no new admin 
>>>>> credentials are needed/
>>>>>
>>>>>
>>>>>> - Any reasons to replace 1-3 PUT requests to /_config with single POST
>>>>>> one in this case?
>>>>>
>>>>> I’m not sure what the 1-3 PUT requests are?
>>>>
>>>> These ones:
>>>> curl -XPUT http://localhost:5984/_config/admin/root -d '"password"' -H
>>>> 'Content-Type: application/json'
>>>> curl -XPUT http://localhost:5984/_config/httpd/bind_address -d
>>>> '"0.0.0.0"' -H 'Content-Type: application/json'
>>>> curl -XPUT http://localhost:5984/_config/httpd/port -d '"5984"' -H
>>>> 'Content-Type: application/json'
>>>>
>>>> The last two are optional as like as the related fields are optional
>>>> for /_setup call.
>>>
>>> Thank, got it. If it is only these settings, then the one /_setup call
>>> with action enable_cluster could be replaced by this. I think having
>>> a deliberate /_setup that duplicates some /_config stuff is actually
>>> helpful.
>>>
>>> Incidentally, how does /_config behave in a cluster? Does it write
>>> back to all nodes’s local.ini file?
>>>
>>>
>>>>
>>>>>> 3. Pick any one node, for simplicity use the first one, to be the
>>>>>>> “setup coordination node”.
>>>>>>> - this is a “master” node that manages the setup and requires all
>>>>>>> other nodes to be able to see it and vice versa. Setup won’t work
>>>>>>> with unavailable nodes (duh). The notion of “master” will be gone
>>>>>>> once the setup is finished. At that point, the system has no
>>>>>>> master node. Ignore I ever said “master”.
>>>>>>>
>>>>>>> a. Go to Fauxton / Cluster Setup, once we have enabled the cluster, the
>>>>>>> UI shows an “Add Node” interface with the fields admin, and node:
>>>>>>> - POST to /_setup with
>>>>>>> {
>>>>>>>  "action": "add_node",
>>>>>>>  "admin": { // should be auto-filled from Fauxton
>>>>>>>    "user": "username",
>>>>>>>    "pass": "password"
>>>>>>>  },
>>>>>>>  "node": {
>>>>>>>    "host": "hostname",
>>>>>>>    ["port": 5984]
>>>>>>>  }
>>>>>>> }
>>>>>>>
>>>>>>> b. as in a, but without the Fauxton bits, just POST to /_setup
>>>>>>> - this request will do this:
>>>>>>> - on the “setup coordination node”:
>>>>>>> - check if we have an Erlang Cookie Secret. If not, generate
>>>>>>>  a UUID and set the erlang cookie to to that UUID.
>>>>>>>  // TBD: persist the cookie, so it survives restarts
>>>>>>> - make a POST request to the node specified in the body above
>>>>>>>  using the admin credentials in the body above:
>>>>>>>  POST to http://username:password@node_b:5984/_setup with:
>>>>>>>  {
>>>>>>>    "action": "receive_cookie",
>>>>>>>    "cookie": "<secretcookie>",
>>>>>>>  }
>>>>>>>  // TBD: persist the cookie on node B, so it survives restarts
>>>>>>>
>>>>>>> - when the request to node B returns, we know the Erlang-level
>>>>>>>  inter-cluster communication is enabled and we can start adding
>>>>>>>  the node on the CouchDB level. To do that, the “setup
>>>>>>>  coordination node” does this to it’s own HTTP endpoint:
>>>>>>>  PUT /nodes/node_b:5984 or the same thing with internal APIs.
>>>>>>>
>>>>>>> - Repeat for all nodes.
>>>>>>> - Fauxton keeps a list of all set up nodes for users to see.
>>>>>>
>>>>>> Question:
>>>>>> - Since Fauxton already known all the nodes admin credentials and all
>>>>>> the nodes are bounded to 0.0.0.0 iface (from previous step), will
>>>>>> Fauxton automate nodes join into the cluster? This is about to skip
>>>>>> "Repeat on all nodes" step
>>>>>
>>>>> How does Fauxton know about the other nodes at this point?
>>>>> (I guess since the Erlang cluster is already set up, it could expose that
>>>>> info to Fauxton in a zeroconf kind of fashion and auto-populate the 
>>>>> Fauxton
>>>>> UI with nodes that then can be joined with just a click of a button.)
>>>>
>>>> Oh, right. The "should be auto-filled from Fauxton" comment confused
>>>> me, so I thought about that Fauxton is already aware about nodes list.
>>>> However, zeroconf is desirable, but this is another feature to add. So
>>>> everything is ok here.
>>>
>>> I don’t mean zeroconf specifically, but your comment got me a new idea,
>>> about using Erlang cluster functions to auto-detect nodes, but now that
>>> I think about it, I don’t think it works like I want, so let’s shelve
>>> that part. We can opt into proper zeroconf anytime later.
>>>
>>>>
>>>>>> - If some of my nodes have different admin credentials, is this the
>>>>>> blocker error case or should Fauxton ask me for these credentials?
>>>>>
>>>>> That’s why `add_node` takes a username and password as options, you
>>>>> can set that up if you want. / This could also be made an error case.
>>>>> It should certainly not be recommended.
>>>>>
>>>>
>>>> Right, same confusion by "auto-filled" commentary (:
>>>>
>>>>>> - Any reasons for replacing regular request to /_nodes with custom
>>>>>> /_setup?
>>>>>
>>>>> I don’t know what /_nodes is. Do you mean /nodes? — The reason this isn’t
>>>>> using /nodes at this point is that /nodes already has a special meaning
>>>>> and I didn’t want to complicate the existing logic. In addition, /nodes
>>>>> might have to be adjusted to carry the username and password of the target
>>>>> CouchDB to do the setup (if we otherwise keep the proposed model, happy
>>>>> to see alternatives, though!).
>>>>>
>>>>> If we can reduce all of what I outlined to `PUT /nodes/node_b|c|d`, that
>>>>> would be nice. Fauxton could then offer the setup UI based on whether 
>>>>> /nodes
>>>>> has any entries. But I don’t know enough about the semantics and other
>>>>> uses of /nodes, so I haven’t thought about this option too much.
>>>>>
>>>>
>>>> *Bikeshedding alert*: shouldn't system database names be started with
>>>> leading underscore?(:
>>>> Yes, /nodes. Btw, nice idea about storing there node credentials -
>>>> this should help with cluster management in case when admin
>>>> credentials are different everywhere. I only worry that this would
>>>> cause a conflict with cassim logic.
>>>
>>> /nodes is what BigCouch uses and I quickly conferred with rnewson on
>>> IRC. /nodes is only available on :5986, e.g the per-node administration
>>> port. It is not part of the public API on :5984, so I think we can skip
>>> the underscoring there for now. (Incidentally, that is why we can’t do
>>> the setup just from Fauxton, because we need to write to :5986 from
>>> :5984.
>>>
>>>
>>>>>> Point about cookie counts.
>>>>>
>>>>> Not sure I follow.
>>>>
>>>> I'd tried to overcome your reply. I could be wrong, but /nodes doesn't
>>>> knows anything about Erlang cookies and how to work with them while
>>>> your /_setup provides such functionality.  I eventually trying to find
>>>> reasons to avoid having special HTTP resource which will used only
>>>> once for whole cluster lifespan while there exists other which are
>>>> able to made the same job. Setting up cookies makes a sense to have it
>>>> instead.
>>>
>>> Yeah, that’s exactly my point, I think if we can make /nodes (that I now
>>> learn is node-only (see above)) understand Erlang cookie business, we
>>> could avoid /_setup, but since /nodes is only available on :5986 and we
>>> are trying new users to never have to see anything but :5984, we need
>>> /_setup as a “proxy”. In addition, /nodes only “works” after the Erlang
>>> cookie is set up in all nodes, so we are in a chicken and egg situation
>>> here, and I think that’s the final thing that requires us to use /_setup.
>>>
>>>
>>>> 4.a. When all nodes are added, click the [Finish Cluster Setup] button
>>>>>>> in Fauxton.
>>>>>>> - this does POST /_setup
>>>>>>> {
>>>>>>>  "action": "finish_setup"
>>>>>>> }
>>>>>>>
>>>>>>> b. Same as in a.
>>>>>>>
>>>>>>> - this manages the final setup bits, like creating the _users,
>>>>>>> _replicator and _db_updates endpoints and whatever else is needed.
>>>>>>> // TBD: collect what else is needed.
>>>>>>
>>>>>> This is the only useful thing that /_setup does from my current point
>>>>>> of view - everything else was just masking standard requests to
>>>>>> existed API.
>>>>>
>>>>> Which existing API in particular?
>>>>>
>>>>> If you mean that this all can be done over /_config and /nodes, yes 
>>>>> totally,
>>>>> but Fauxton on node_a can’t access /_config on node_b. That’s one of the
>>>>> reasons of why I suggest using /_setup, so it can do all this from a 
>>>>> single
>>>>> node via Fauxton. The other reason is that it is a dedicated API end-point
>>>>> that hides a lot of complexity instead of having end-users hit a bunch of
>>>>> seemingly random endpoints (although this *could* be hidden in Fauxton 
>>>>> maybe,
>>>>> except for the cross domain issue).
>>>>
>>>> Yes, I'm about /_config and /nodes. But why Fauxton cannot access to
>>>> config on node_b? Especially, if it knows the credentials and node_b
>>>> bounded to 0.0.0.0 iface.
>>>
>>> Same origin policy in browsers, see above :)
>>>
>>>
>>>> About API usage complexity: for followers of Fauxton-driven way they
>>>> really don't care about what HTTP requests will be made behind while
>>>> nice spinner loops in their browser. As for fellows of console way
>>>> this isn't an issue too: some small cluster installations are easily
>>>> to made via "seemingly random endpoints" following our guidelines; for
>>>> bigger clusters these processes tends to be automated by provisioning
>>>> tools.
>>>
>>> Yeah, I can get behind the reasoning that complexity can be hidden
>>> behind Fauxton and cli setup can be a bit more complex. I just like
>>> the idea of making this a first-class setup citizen :)
>>>
>>>
>>>>>>> ## The Setup Endpoint
>>>>>>>
>>>>>>> This is not a REST-y endpoint, it is a simple state machine operated
>>>>>>> by HTTP POST with JSON bodies that have an `action` field.
>>>>>>>
>>>>>>> ### State 1: No Cluster Enabled
>>>>>>>
>>>>>>> This is right after starting a node for the first time, and any time
>>>>>>> before the cluster is enabled as outlined above.
>>>>>>>
>>>>>>> GET /_setup
>>>>>>> {"state": "cluster_disabled"}
>>>>>>>
>>>>>>> POST /_setup {"action":"enable_cluster"...} -> Transition to State 2
>>>>>>> POST /_setup {"action":"enable_cluster"...} with empty admin user/pass 
>>>>>>> or invalid host/post or host/port not available -> Error
>>>>>>> POST /_setup {"action":"anything_but_enable_cluster"...} -> Error
>>>>>>>
>>>>>>
>>>>>> If "enable_cluster" only creates/setups admin and bind address, could
>>>>>> this step be skipped? Because the same actions are possible to do via
>>>>>> regular config setup.
>>>>>
>>>>> Yes! It just needs to ensure these things are done. If Fauxton detects
>>>>> they *are* done, it can skip the enable step and show the add_node 
>>>>> interface
>>>>> right away.
>>>>
>>>> Good!
>>>>
>>>>>>
>>>>>>
>>>>>>> ### State 2: Cluster enabled, admin user set, waiting for nodes to be 
>>>>>>> added.
>>>>>>>
>>>>>>> GET /_setup
>>>>>>> {"state":"cluster_enabled","nodes":[]}
>>>>>>>
>>>>>>> POST /_setup {"action":"enable_cluster"...} -> Error
>>>>>>> POST /_setup {"action":"add_node"...} -> Stay in State 2, but return 
>>>>>>> "nodes":["node B"}] on GET
>>>>>>> POST /_setup {"action":"add_node"...} -> if target node not available, 
>>>>>>> Error
>>>>>>> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error
>>>>>>> POST /_setup {"action":"finish_cluster"} -> Transition to State 3
>>>>>>>
>>>>>>
>>>>>> Questions:
>>>>>> - How much nodes required to be added? 1? 2? 3?...
>>>>>
>>>>> Doesn’t matter.
>>>>
>>>> Then a case:
>>>> POST /_setup {"action":"finish_cluster"} with no nodes set up -> Error
>>>>
>>>> will never happens since there will be always at least one done in
>>>> cluster - those one who runs setup (:
>>>
>>> From Fauxton, yes, but an API user could just call enable_cluster and
>>> then finish_cluster, and they should get an appropriate error :)
>>>
>>>
>>>>>> - How to remove accidentally added node from cluster?
>>>>>
>>>>> Delete from /nodes database. Could be added as a UI element in Fauxton.
>>>>
>>>> That's what I worried about: we adding nodes via /_setup, but have to
>>>> remove them via /nodes. Consistency have to be preserved (:
>>>
>>> Now that I know about /nodes being on :5986 only, /_setup needs a
>>> remove_node action as well. Thanks for flagging that :)
>>>
>>>
>>>
>>>>
>>>>
>>>>>>
>>>>>>> ### State 3: Cluster set up, all nodes operational
>>>>>>>
>>>>>>> GET /_setup
>>>>>>> {"state":"cluster_finished","nodes":["node a", "node b", ...]}
>>>>>>>
>>>>>>> POST /_setup {"action":"enable_cluster"...} -> Error
>>>>>>> POST /_setup {"action":"finish_cluster"...} -> Stay in State 3, do 
>>>>>>> nothing
>>>>>>> POST /_setup {"action":"add_node"...} -> Error
>>>>>>> POST /_setup?i_know_what_i_am_doing=true {"action":"add_node"...} -> 
>>>>>>> Add node, stay in State 3.
>>>>>>>
>>>>>>> // TBD: we need to persist the setup state somewhere.
>>>>>>>
>>>>>>
>>>>>> Questions:
>>>>>> - Why adding a new node after finish_cluster is some specific case to
>>>>>> mark it with "i_know_what_i_am_doing" parameter?
>>>>>
>>>>> Because I think it is not advisable to do this regularly, but someone 
>>>>> might
>>>>> want to do this regardless (see next).
>>>>>
>>>>>
>>>>>> - How to enlarge / reduce cluster after his setup or even disband it?
>>>>>
>>>>> Enlarge: see above.
>>>>> Reduce: delete from /nodes
>>>>> Disband: shut down all CouchDB processes :)
>>>>>
>>>>> I don’t know the BigCouch/Cloudant best practices for this. I’ll chalk
>>>>> this down as a “needs input from Cloudant people” :)
>>>>>
>>>>
>>>> You run me into recursion with (see next) and (see above) notes! Nice
>>>> trick, but still unclear how to let your cluster grow - this isn't
>>>> some exceptional case. Reducing (not eventual during network issues)
>>>> is what more rarely could happens. +1 for having more info from
>>>> Cloudant people (:
>>>>
>>>>
>>>>>> Or this isn't what /_setup should cares about?
>>>>>
>>>>> In general, this isn’t really covered by the setup proposal here. I’d like
>>>>> to keep this out of scope for now, but we should have good answers to that
>>>>> going forward.
>>>>
>>>> Agreed.
>>>>
>>>>>> - What happens with /_setup resource after finish_cluster? Any case
>>>>>> for it to be useful?
>>>>>
>>>>> Only for Fauxton to show the correct setup state.
>>>>>
>>>>
>>>> If so then I just figured out some better name for it: /_cluster
>>>> - it setups the cluster as you planned
>>>> - it shows cluster state as you planned
>>>> - it allows to manage cluster nodes in the way which isn't suitable
>>>> for /nodes API (like setting cookies)
>>>> - it becomes useful after cluster setup
>>>> and it could handle other cluster-wide tasks.
>>>>
>>>> What do you think about?
>>>
>>> I’m not attached to any particular name /_cluster or /_cluster_setup
>>> work for me :)
>>>
>>>
>>>>>> - How could /_setup helps with admin password change among the all
>>>>>> cluster nodes?
>>>>>
>>>>> At least on the first run setup, Fauxton can just keep the new password in
>>>>> memory and pre-fill the add_node screens with the same username and 
>>>>> password.
>>>>> /_setup then transports it over.
>>>>>
>>>>> For later setups, I don’t know, as we would have the admin to enter the 
>>>>> password
>>>>> in plaintext so we can send it. Alternatively, we could use the /_config 
>>>>> API to
>>>>> read and send the PBKDF2 hash *waves hands*.
>>>>>
>>>>
>>>> "Fauxton can just keep the new password in memory" opens a door to the
>>>> issue when you accidentally refreshes page / closes tab / loses page
>>>> memory in other way. Not an flaw, just a case to remember about.
>>>>
>>>> As about sending (or replicating) PBKDF2 hash looks good for me.
>>>
>>> Yeah, that sounds like a more robust solution. We’d need something
>>> simliar for the HTTP auth secret.
>>
>> Meh, catch-22 again. Since /_setup is admin-only and we have already set up
>> the target node, we will need the cluster password in plain text added there.
>> We could mitigate some of the problems you outline by storing the password
>> in localStorage until we finsh_cluster (or a timeout, whichever occurs 
>> first).
>>
>>>
>>>
>>>>>> - If I add a new node after "finish_cluster" setup, will it have all
>>>>>> system databases (global_changes, cassim, _users...whatever else)
>>>>>> created?
>>>>>
>>>>> That is unspecified at this point. I’d need more input from the Cloudant 
>>>>> people
>>>>> on this one. I’m happy to go either way, or make it an option for later 
>>>>> joined
>>>>> nodes.
>>>>
>>>> Ok. Let's wait what Cloudant people say. I'm pretty sure they already
>>>> know the solution for all these problems or at least knows their
>>>> specifics.
>>>>
>>>> Thanks a lot, Jan! (:
>>>
>>> Thank you Alex, this helps a lot! :)
>>>
>>> * * *
>>>
>>> I’ll start putting this proposal into the wiki, so we can see it evolve from
>>> now on.
>>>
>>> * * *
>>>
>>> Best
>>> Jan
>>> --
>

Reply via email to