Re: [PROPOSAL] Speed up _list and _update optimizing req object

Giovanni Lenzi Sat, 14 Nov 2015 12:20:07 -0800

Ah ok.. mmhh I'm afraid that if you don't put inside the document the name
of who must handle its execution, you are going, soon or later, to deal
with possible duplicates due to the fact that this kind of triggers does
not guarante atomicity.. I mean, if nodes are continuosly replicated,
nothing prevents two nodes to see, at same time, the same trigger as
not-yet-executed and start its execution.


I remember now another use case where we needed a config parameter inside
update function. This is the case of the 'secret' parameter which is also
used in chatty tutorial to improve authentication with role-based
authentication. Currently we workaround that, by passing the secret
parameter to update functions by specifying it inside the vhost path, but
probably accessing config parameters from req object would be more elegant.

I think that config parameters inside list/update/rewrites is probably one
of those typical feature which can enable a lot of hidden possibilities
which today we simply can't grasp.

Do you think the performance penalty would be so high? even if we limit it
only to one config section only? I mean something like
[function_parameters].

However even if this would become feasible, I don't consider this a primary
feature, of course. Take this as just an evaluation for future feasibility.

Thanks!!

--Giovanni
Il giorno 14/nov/2015 15:14, "Alexander Shorin" <[email protected]> ha
scritto:

> No, I mean that you process each db changes separately by a single or
> multiple workers, but use another single db that tracks workers tasks
> per db and helps to avoid duplicates tasks since all your dbs are in
> sync.
> --
> ,,,^..^,,,
>
>
> On Fri, Nov 13, 2015 at 3:03 PM, Giovanni Lenzi <[email protected]>
> wrote:
> >> Every nodeX will have the same "notification" process, which is
> listening
> > to dbX/_changes.
> > sorry with "same" here, I mean same type of process, but obviously one
> > instance of it, running on each node
> >
> > --Giovanni
> >
> > 2015-11-13 13:00 GMT+01:00 Giovanni Lenzi <[email protected]>:
> >
> >> not sure I understood correctly.. what you mean is:
> >>
> >> I create 3 nodes:
> >> node1 with single database named db1
> >> node2 with single database named db2
> >> node3 with single database named db3
> >>
> >> Then I create 3 continuous replication: db1 <-> db2, db1<-> db3, db2
> <->db3
> >>
> >> Every nodeX will have the same "notification" process, which is
> listening
> >> to dbX/_changes.
> >>
> >> What you mean is then: "I use db_name as filter instead of node_name,
> >> given that every nodeX will have one and only one single database dbX".
> >> Right?
> >>
> >>
> >>
> >> --Giovanni
> >>
> >> 2015-11-13 11:44 GMT+01:00 Alexander Shorin <[email protected]>:
> >>
> >>> On Fri, Nov 13, 2015 at 1:28 PM, Giovanni Lenzi <[email protected]
> >
> >>> wrote:
> >>> >> No, slow is gathering all the stats. Especially in cluster. The
> >>> >> db_name you can get from req.userCtx without problem.
> >>> >>
> >>> >
> >>> > Does req.userCtx contain also db_name currently? I thought it was
> only
> >>> for
> >>> > user data (username and roles). Are you saying that it would be
> >>> possible
> >>> > to gather db_name only or you are forced to fetch the entire set
> only?
> >>> >
> >>>
> >>> not db_name exactly, but:
> >>>
> >>>     "userCtx": {
> >>>         "db": "mailbox",
> >>>         "name": "Mike",
> >>>         "roles": [
> >>>             "user"
> >>>         ]
> >>>     }
> >>>
> >>>
> >>> >> > Also I was wondering how heavy could be to include some kind of
> >>> machine
> >>> >> > identifier(hostname or ip address of machine running couchdb)
> inside
> >>> of
> >>> >> the
> >>> >> > request object?
> >>> >>
> >>> >> What is use case for this? Technically, req.headers['Host'] points
> on
> >>> >> the requested CouchDB.
> >>> >>
> >>> >> > Or if you want to make it even more flexible: how heavy could be
> to
> >>> >> include
> >>> >> > a configuration parameter inside of the request object?
> >>> >> >
> >>> >> > That could be of great help in some N-nodes master-master redunded
> >>> >> database
> >>> >> > configurations, to let one node only(the write node) handle some
> >>> specific
> >>> >> > background action.
> >>> >>
> >>> >> Can you describe this problem a little bit more? How this
> >>> >> configuration parameter could be used and what it will be?
> >>> >>
> >>> >>
> >>> > Ok let's think to a 2-node setup with master-master replication set
> up
> >>> and
> >>> > a round-robin load-balancer in front of them. In normal condition,
> with
> >>> > master-master replication you can balance both read and write
> requests
> >>> to
> >>> > every node, right?
> >>> >
> >>> > Now, let's think we need backend services too(email, sms, payments)
> by
> >>> > using some plugin or node.js process(like triggerjob). These  react
> to
> >>> > database _changes, execute some background task and then update the
> same
> >>> > document with a COMPLETED state. The drawback is that, in N-node
> >>> > configuration, every node is going to execute same background
> tasks(2 or
> >>> > N-emails will be sent instead of 1, 2 payment transaction instead of
> 1
> >>> and
> >>> > so on).
> >>> >
> >>> > Ok, you may say, with haproxy you can balance only reads(GET,HEAD)
> and
> >>> use
> >>> > one node only for writes. But what if the write-node goes down? I
> won't
> >>> > have the chance to write anymore, only read.
> >>> >
> >>> > BUT we can probably do better.. let's step back to balance both read
> and
> >>> > writes. If we have a way to specify, in the update function itself,
> >>> which
> >>> > node is in charge of executing those tasks, they could then be
> executed
> >>> > only once! A trivial, but efficient solution which comes to my mind
> is:
> >>> let
> >>> > the backend task be handled by the node who received the write
> request.
> >>> If
> >>> > the update function knows some kind of machine identifier (or
> >>> configuration
> >>> > parameter previously setup), it could mark the task in the document
> >>> itself
> >>> > with the name of the machine responsible for its execution. The
> plugin
> >>> or
> >>> > node-js process may then execute only tasks allocated to him, by
> simply
> >>> > using a filtered _changes request with his own node name.
> >>> >
> >>> > This solution has the benefit of letting system administrators to
> have
> >>> > identical N nodes (same data, same ddocs and configuration, only node
> >>> name
> >>> > differs) which balance both read, write requests and backend task
> >>> > processing. In this way you may then scale out by simply spawning a
> new
> >>> > node with the same amazon AMI as example.
> >>> >
> >>> > Am I missing something?
> >>>
> >>> That's what 2.0 is going to solve (:
> >>>
> >>> For 1.x I would use the following configuation:
> >>>
> >>> db1 --- /_changes --\
> >>> db2 --- /_changes ---> notification-process -> notification-db
> >>> dbN --- /_changes --/
> >>>
> >>> In notification db you store all the tasks that are need to be done
> >>> and are already done. Since your db1, db2, dbN are in sync, their
> >>> changes feed will eventually produce similar events which you'll have
> >>> to filter by using your notification-db data.
> >>>
> >>> --
> >>> ,,,^..^,,,
> >>>
> >>
> >>
>

Re: [PROPOSAL] Speed up _list and _update optimizing req object

Reply via email to