Hi folks,

in the last week we got a new API endpoint:

```
_node/<fqdn>/_config
```

Using this endpoint you can reach every node in a cluster by
specifying the name of the node.

This sound cool to me for using it in Fauxton as the Fauxton team
wants to keep the ability to configure CouchDB using Fauxton like in
1.x where you can change your config on a CouchDB that is running in
production.

In the last days I worked with the API in Fauxton and had to explain
the new config section in Fauxton to several people and while doing
that I got more and more aware that it is not really working well for
multi-node-setups.

Here are the things I found out or I had to explain:

- the feature is not intended for more than 5, maybe 10 nodes as it is
not feasible for the user and also gets more and more error prone the
more nodes we have in the cluster (e.g. network partitions)

- for all other settings the cluster is in a state where the configs
on the nodes are different, maybe up to 10 minutes for a 10 nodes
cluster that gets a new configuration manually using the UI by
clicking through the nodes. For a change of the Basic-Auth settings
that means that the user (developer using CouchDB) has to throw a lot
of code onto the client that uses CouchDB to handle the situation of
the inconsistent cluster

- when we try to just update all nodes at once using multiple AJAX
requests the cluster is maybe inconsistent for a few seconds. While
this is also a problem it really gets a problem when we try to change
sections like the admin-config where Fauxton gets a 401 at some point.
The 401 happens as the node we are talking to with our JS already got
the new password and applied the change. This problem looks different
when talking directly to a node or talking to it behind a load
balancer (as the load balancer shuffles our requests to /_session)

- given we solved the previous problem for Basic Auth, Admin-Sections
and the login and we try to update all nodes at once and one request
fails, the cluster is also inconsistent. I think this is the same
reason why we are not using erlang rpc calls behind the scenes to send
configuration updates to all nodes. For the user (dev) it would also
mean that they have to add a lot of logic to their clients to handle
the case that the config change of the admin fails temporally for one
node, e.g because a HTTP request timed out.


Here is the proposal for the config section in Fauxton:

Detect if we are running in "Single Node Mode". This can be a N=0
setting which was set by the setup wizard that is coming to Fauxton if
the user chooses that they don't want to setup a cluster - or can also
be a node count of 1 in /_memberships.

Just if that is the case, we are displaying the config as we can
guarantee that the config and login is working for the user. If we
detect that we have multiple nodes we are displaying an info with our
suggested way to change the config for clusters.

For the case a node is not joined into a 50 nodes cluster yet there is
no use-case in using Fauxton for configuration as they will be managed
automatically, but even then an admin could use the UI to copy over
the config bits to the new node until it is joined. Until then and
also after the join (given the admin copied all config sections
properly) the UI stays usable (no random 401s)

The new endpoint would be still useful for ad-hoc HTTP queries to find
out the config of a given node. If it turns out to be unuseful we
could remove it later and learned more how our users (admins, devs
etc.) use CouchDB.

This way we can keep the config section for small setups which will
also be a fair amount of Couch 2 installations, provide a reliable UI
with the same high quality of the past and have a way to find out
configs for nodes using HTTP on the cluster interface.

Best,
Robert

Reply via email to