No, I mean that you process each db changes separately by a single or multiple workers, but use another single db that tracks workers tasks per db and helps to avoid duplicates tasks since all your dbs are in sync. -- ,,,^..^,,,
On Fri, Nov 13, 2015 at 3:03 PM, Giovanni Lenzi <[email protected]> wrote: >> Every nodeX will have the same "notification" process, which is listening > to dbX/_changes. > sorry with "same" here, I mean same type of process, but obviously one > instance of it, running on each node > > --Giovanni > > 2015-11-13 13:00 GMT+01:00 Giovanni Lenzi <[email protected]>: > >> not sure I understood correctly.. what you mean is: >> >> I create 3 nodes: >> node1 with single database named db1 >> node2 with single database named db2 >> node3 with single database named db3 >> >> Then I create 3 continuous replication: db1 <-> db2, db1<-> db3, db2 <->db3 >> >> Every nodeX will have the same "notification" process, which is listening >> to dbX/_changes. >> >> What you mean is then: "I use db_name as filter instead of node_name, >> given that every nodeX will have one and only one single database dbX". >> Right? >> >> >> >> --Giovanni >> >> 2015-11-13 11:44 GMT+01:00 Alexander Shorin <[email protected]>: >> >>> On Fri, Nov 13, 2015 at 1:28 PM, Giovanni Lenzi <[email protected]> >>> wrote: >>> >> No, slow is gathering all the stats. Especially in cluster. The >>> >> db_name you can get from req.userCtx without problem. >>> >> >>> > >>> > Does req.userCtx contain also db_name currently? I thought it was only >>> for >>> > user data (username and roles). Are you saying that it would be >>> possible >>> > to gather db_name only or you are forced to fetch the entire set only? >>> > >>> >>> not db_name exactly, but: >>> >>> "userCtx": { >>> "db": "mailbox", >>> "name": "Mike", >>> "roles": [ >>> "user" >>> ] >>> } >>> >>> >>> >> > Also I was wondering how heavy could be to include some kind of >>> machine >>> >> > identifier(hostname or ip address of machine running couchdb) inside >>> of >>> >> the >>> >> > request object? >>> >> >>> >> What is use case for this? Technically, req.headers['Host'] points on >>> >> the requested CouchDB. >>> >> >>> >> > Or if you want to make it even more flexible: how heavy could be to >>> >> include >>> >> > a configuration parameter inside of the request object? >>> >> > >>> >> > That could be of great help in some N-nodes master-master redunded >>> >> database >>> >> > configurations, to let one node only(the write node) handle some >>> specific >>> >> > background action. >>> >> >>> >> Can you describe this problem a little bit more? How this >>> >> configuration parameter could be used and what it will be? >>> >> >>> >> >>> > Ok let's think to a 2-node setup with master-master replication set up >>> and >>> > a round-robin load-balancer in front of them. In normal condition, with >>> > master-master replication you can balance both read and write requests >>> to >>> > every node, right? >>> > >>> > Now, let's think we need backend services too(email, sms, payments) by >>> > using some plugin or node.js process(like triggerjob). These react to >>> > database _changes, execute some background task and then update the same >>> > document with a COMPLETED state. The drawback is that, in N-node >>> > configuration, every node is going to execute same background tasks(2 or >>> > N-emails will be sent instead of 1, 2 payment transaction instead of 1 >>> and >>> > so on). >>> > >>> > Ok, you may say, with haproxy you can balance only reads(GET,HEAD) and >>> use >>> > one node only for writes. But what if the write-node goes down? I won't >>> > have the chance to write anymore, only read. >>> > >>> > BUT we can probably do better.. let's step back to balance both read and >>> > writes. If we have a way to specify, in the update function itself, >>> which >>> > node is in charge of executing those tasks, they could then be executed >>> > only once! A trivial, but efficient solution which comes to my mind is: >>> let >>> > the backend task be handled by the node who received the write request. >>> If >>> > the update function knows some kind of machine identifier (or >>> configuration >>> > parameter previously setup), it could mark the task in the document >>> itself >>> > with the name of the machine responsible for its execution. The plugin >>> or >>> > node-js process may then execute only tasks allocated to him, by simply >>> > using a filtered _changes request with his own node name. >>> > >>> > This solution has the benefit of letting system administrators to have >>> > identical N nodes (same data, same ddocs and configuration, only node >>> name >>> > differs) which balance both read, write requests and backend task >>> > processing. In this way you may then scale out by simply spawning a new >>> > node with the same amazon AMI as example. >>> > >>> > Am I missing something? >>> >>> That's what 2.0 is going to solve (: >>> >>> For 1.x I would use the following configuation: >>> >>> db1 --- /_changes --\ >>> db2 --- /_changes ---> notification-process -> notification-db >>> dbN --- /_changes --/ >>> >>> In notification db you store all the tasks that are need to be done >>> and are already done. Since your db1, db2, dbN are in sync, their >>> changes feed will eventually produce similar events which you'll have >>> to filter by using your notification-db data. >>> >>> -- >>> ,,,^..^,,, >>> >> >>
