Hi Alexander,

No, slow is gathering all the stats. Especially in cluster. The
> db_name you can get from req.userCtx without problem.
>

Does req.userCtx contain also db_name currently? I thought it was only for
user data (username and roles). Are you saying that it would be  possible
to gather db_name only or you are forced to fetch the entire set only?


>
> > Also I was wondering how heavy could be to include some kind of machine
> > identifier(hostname or ip address of machine running couchdb) inside of
> the
> > request object?
>
> What is use case for this? Technically, req.headers['Host'] points on
> the requested CouchDB.
>
> > Or if you want to make it even more flexible: how heavy could be to
> include
> > a configuration parameter inside of the request object?
> >
> > That could be of great help in some N-nodes master-master redunded
> database
> > configurations, to let one node only(the write node) handle some specific
> > background action.
>
> Can you describe this problem a little bit more? How this
> configuration parameter could be used and what it will be?
>
>
Ok let's think to a 2-node setup with master-master replication set up and
a round-robin load-balancer in front of them. In normal condition, with
master-master replication you can balance both read and write requests to
every node, right?

Now, let's think we need backend services too(email, sms, payments) by
using some plugin or node.js process(like triggerjob). These  react to
database _changes, execute some background task and then update the same
document with a COMPLETED state. The drawback is that, in N-node
configuration, every node is going to execute same background tasks(2 or
N-emails will be sent instead of 1, 2 payment transaction instead of 1 and
so on).

Ok, you may say, with haproxy you can balance only reads(GET,HEAD) and use
one node only for writes. But what if the write-node goes down? I won't
have the chance to write anymore, only read.

BUT we can probably do better.. let's step back to balance both read and
writes. If we have a way to specify, in the update function itself, which
node is in charge of executing those tasks, they could then be executed
only once! A trivial, but efficient solution which comes to my mind is: let
the backend task be handled by the node who received the write request. If
the update function knows some kind of machine identifier (or configuration
parameter previously setup), it could mark the task in the document itself
with the name of the machine responsible for its execution. The plugin or
node-js process may then execute only tasks allocated to him, by simply
using a filtered _changes request with his own node name.

This solution has the benefit of letting system administrators to have
identical N nodes (same data, same ddocs and configuration, only node name
differs) which balance both read, write requests and backend task
processing. In this way you may then scale out by simply spawning a new
node with the same amazon AMI as example.

Am I missing something?

--Giovanni

Reply via email to