Re: [PROPOSAL] Speed up _list and _update optimizing req object

Alexander Shorin Sat, 14 Nov 2015 06:15:08 -0800

No, I mean that you process each db changes separately by a single or
multiple workers, but use another single db that tracks workers tasks
per db and helps to avoid duplicates tasks since all your dbs are in
sync.
--
,,,^..^,,,



On Fri, Nov 13, 2015 at 3:03 PM, Giovanni Lenzi <[email protected]> wrote:
>> Every nodeX will have the same "notification" process, which is listening
> to dbX/_changes.
> sorry with "same" here, I mean same type of process, but obviously one
> instance of it, running on each node
>
> --Giovanni
>
> 2015-11-13 13:00 GMT+01:00 Giovanni Lenzi <[email protected]>:
>
>> not sure I understood correctly.. what you mean is:
>>
>> I create 3 nodes:
>> node1 with single database named db1
>> node2 with single database named db2
>> node3 with single database named db3
>>
>> Then I create 3 continuous replication: db1 <-> db2, db1<-> db3, db2 <->db3
>>
>> Every nodeX will have the same "notification" process, which is listening
>> to dbX/_changes.
>>
>> What you mean is then: "I use db_name as filter instead of node_name,
>> given that every nodeX will have one and only one single database dbX".
>> Right?
>>
>>
>>
>> --Giovanni
>>
>> 2015-11-13 11:44 GMT+01:00 Alexander Shorin <[email protected]>:
>>
>>> On Fri, Nov 13, 2015 at 1:28 PM, Giovanni Lenzi <[email protected]>
>>> wrote:
>>> >> No, slow is gathering all the stats. Especially in cluster. The
>>> >> db_name you can get from req.userCtx without problem.
>>> >>
>>> >
>>> > Does req.userCtx contain also db_name currently? I thought it was only
>>> for
>>> > user data (username and roles). Are you saying that it would be
>>> possible
>>> > to gather db_name only or you are forced to fetch the entire set only?
>>> >
>>>
>>> not db_name exactly, but:
>>>
>>>     "userCtx": {
>>>         "db": "mailbox",
>>>         "name": "Mike",
>>>         "roles": [
>>>             "user"
>>>         ]
>>>     }
>>>
>>>
>>> >> > Also I was wondering how heavy could be to include some kind of
>>> machine
>>> >> > identifier(hostname or ip address of machine running couchdb) inside
>>> of
>>> >> the
>>> >> > request object?
>>> >>
>>> >> What is use case for this? Technically, req.headers['Host'] points on
>>> >> the requested CouchDB.
>>> >>
>>> >> > Or if you want to make it even more flexible: how heavy could be to
>>> >> include
>>> >> > a configuration parameter inside of the request object?
>>> >> >
>>> >> > That could be of great help in some N-nodes master-master redunded
>>> >> database
>>> >> > configurations, to let one node only(the write node) handle some
>>> specific
>>> >> > background action.
>>> >>
>>> >> Can you describe this problem a little bit more? How this
>>> >> configuration parameter could be used and what it will be?
>>> >>
>>> >>
>>> > Ok let's think to a 2-node setup with master-master replication set up
>>> and
>>> > a round-robin load-balancer in front of them. In normal condition, with
>>> > master-master replication you can balance both read and write requests
>>> to
>>> > every node, right?
>>> >
>>> > Now, let's think we need backend services too(email, sms, payments) by
>>> > using some plugin or node.js process(like triggerjob). These  react to
>>> > database _changes, execute some background task and then update the same
>>> > document with a COMPLETED state. The drawback is that, in N-node
>>> > configuration, every node is going to execute same background tasks(2 or
>>> > N-emails will be sent instead of 1, 2 payment transaction instead of 1
>>> and
>>> > so on).
>>> >
>>> > Ok, you may say, with haproxy you can balance only reads(GET,HEAD) and
>>> use
>>> > one node only for writes. But what if the write-node goes down? I won't
>>> > have the chance to write anymore, only read.
>>> >
>>> > BUT we can probably do better.. let's step back to balance both read and
>>> > writes. If we have a way to specify, in the update function itself,
>>> which
>>> > node is in charge of executing those tasks, they could then be executed
>>> > only once! A trivial, but efficient solution which comes to my mind is:
>>> let
>>> > the backend task be handled by the node who received the write request.
>>> If
>>> > the update function knows some kind of machine identifier (or
>>> configuration
>>> > parameter previously setup), it could mark the task in the document
>>> itself
>>> > with the name of the machine responsible for its execution. The plugin
>>> or
>>> > node-js process may then execute only tasks allocated to him, by simply
>>> > using a filtered _changes request with his own node name.
>>> >
>>> > This solution has the benefit of letting system administrators to have
>>> > identical N nodes (same data, same ddocs and configuration, only node
>>> name
>>> > differs) which balance both read, write requests and backend task
>>> > processing. In this way you may then scale out by simply spawning a new
>>> > node with the same amazon AMI as example.
>>> >
>>> > Am I missing something?
>>>
>>> That's what 2.0 is going to solve (:
>>>
>>> For 1.x I would use the following configuation:
>>>
>>> db1 --- /_changes --\
>>> db2 --- /_changes ---> notification-process -> notification-db
>>> dbN --- /_changes --/
>>>
>>> In notification db you store all the tasks that are need to be done
>>> and are already done. Since your db1, db2, dbN are in sync, their
>>> changes feed will eventually produce similar events which you'll have
>>> to filter by using your notification-db data.
>>>
>>> --
>>> ,,,^..^,,,
>>>
>>
>>

Re: [PROPOSAL] Speed up _list and _update optimizing req object

Reply via email to