Today, someone came to the #couchdb channel asking about
_externals. For a long while it's been on my mind that perhaps
we should deprecate the entire _externals feature for a number
of reasons:

  1. Couch is not a great reverse proxy. Making it into one is
     as hard as rewriting nginx or haproxy in erlang. It's a
     distraction to our development team and far outside our
     core competency.

  2. In a clustered CouchDB (the default in 2.0), the
     assumptions around externals change drastically. For an 
     _external to work, it must be stateless and not rely upon
     multiple sequential requests to hit the same node (assuming
     the standard n-node cluster + a load balancer/reverse proxy
     at the front.)

     People who wrote a CouchDB 1.x external could reasonably
     expect to write an old-school singleton app (i.e., the only
     copy of that external process running, on a single machine).

     If they engaged in any of a number of bad behaviours for
     distributed systems - storing content on local disk, locking
     or blocking connections to other services/databases in a 
     "single-threaded" pattern, or even expecting CouchDB not to
     possibly introduce a conflict or "read your writes" - they
     will probably fail outright at best, or at worst introduce
     subtle and confusing behaviour.

TL;DR: We're changing the contract we give to _externals in a
reverse-compatibility-breaking way. We either need to document it
straight up, along with all of the admonishments required for
people who expect it to operate the same as in 1.x, or we need to
remove it.

My opinion is that now that the default CouchDB rollout will be
a cluster with a reverse proxy, that _externals should be exposed
through the load balancer, which can then reference 1 or more
processes distributed either on the same CouchDB nodes, or on
different hosts should compute needs demand it.

The exception here would be a single-node CouchDB, which could
still use the same approach. However I don't see the issue with
deploying an haproxy on that same node and using the same approach
I describe above.

Thoughts, comments, suggestions?

-Joan

Reply via email to