+1 on round-robin continuous replication. Ideally it'd allow replications to 
specify a relative priority, e.g. replication of log alerts or chat messages 
might desire lower latency (higher priority) than a ddoc deployment or user 
backup. For now I'm going to just implement my own duct tape version of this, 
using cron jobs to trigger non-continuous replications.

FWIW, I'm sharing with my client's permission the script I've been using to 
load test continuous filtered replication to/from a central master:
https://gist.github.com/natevw/4711127

The test script sets up N+1 databases, writes documents to 1 as the master 
while replicating to the other N as well as "short-polling" the _changes to 
kinda simulate general load on top of the longpolling the application does. On 
OS X I can only get to around 250 users due to some FD_SETSIZE stuff with 
Erlang, but it remains stable if I keep it under that limit — however, it takes 
the user database replications a *long* time to all get caught up (some don't 
even start until a few minutes after the changes stop).

hth,
-natevw


On Feb 4, 2013, at 2:50 PM, Robert Newson wrote:

> I had a mind to teach the _replicator db this trick. Since we have a
> record of everything we need to resume a replication there's no reason
> for a one-to-one correspondence between a _replicator doc and a
> replicator process. We can simply run N of them for a bit (say, a
> batch of 1000 updates) and then switch to others. The internal
> db_updated mechanism is a good way to notice when we might have
> updates worth sending but it's only half the story. A round-robin over
> all _replicator docs (other than one-shot ones, of course) seems a
> really neat trick to me.
> 
> B.
> 
> On 4 February 2013 22:39, Jan Lehnardt <[email protected]> wrote:
>> 
>> On Feb 4, 2013, at 23:14 , Nathan Vander Wilt <[email protected]> wrote:
>> 
>>> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote:
>>>> So I've heard from both hosting providers that it is fine, but also 
>>>> managed to take both of their shared services "down" with only about ~100 
>>>> users (200 continuous filtered replications). I'm only now at the point 
>>>> where I have tooling to build out arbitrary large tests on my local 
>>>> machine to see the stats for myself, but as I understand it the issue is 
>>>> that every replication needs at least one couchjs process to do its 
>>>> filtering for it.
>>>> 
>>>> So rather than inactive users mostly just taking up disk space, they're 
>>>> instead costing a full-fledged process worth of memory and system 
>>>> resources, each, all the time. As I understand it, this isn't much better 
>>>> on BigCouch either since the data is scattered ± evenly on each machine, 
>>>> so while the *computation* is spread, each node in the cluster still needs 
>>>> k*numberOfUsers couchjs processes running. So it's "scalable" in the sense 
>>>> that traditional databases are scalable: vertically, by buying machines 
>>>> with more and more memory.
>>>> 
>>>> Again, I am still working on getting a better feel for the costs involved, 
>>>> but the basic theory with a master-to-N hub is not a great start: every 
>>>> change needs to be processed by every N replications. So if a user writes 
>>>> a document that ends up in the master database, every other user's filter 
>>>> function needs to process that change coming out of master. Even when N 
>>>> users are generating 0 (instead of M) changes, it's not doing M*N work but 
>>>> there's still always 2*N open connections and supporting processes 
>>>> providing a nasty baseline for large values of N.
>>> 
>>> Looks like I was wrong about needing enough RAM for one couchjs process per 
>>> replication.
>>> 
>>> CouchDB maintains a pool of (no more than 
>>> query_server_config/os_process_limit) couchjs processes and work is divvied 
>>> out amongst these as necessary. I found a little meta-discussion of this 
>>> system at https://issues.apache.org/jira/browse/COUCHDB-1375 and the code 
>>> uses it here 
>>> https://github.com/apache/couchdb/blob/master/src/couchdb/couch_query_servers.erl#L299
>>> 
>>> On my laptop, I was able to spin up 250 users without issue. Beyond that, I 
>>> start running into ± hardcoded system resource limits that Erlang has under 
>>> Mac OS X but from what I've seen the only theoretical scalability issue 
>>> with going beyond that on Linux/Windows would be response times, as the 
>>> worker processes become more and more saturated.
>>> 
>>> It still seems wise to implement tiered replications for communicating 
>>> between thousands of *active* user databases, but that seems reasonable to 
>>> me.
>> 
>> CouchDB’s design is obviously lacking here.
>> 
>> For immediate relief, I’ll propose the usual jackhammer of unpopular 
>> responses: write your filters in Erlang. (sorry :)
>> 
>> For the future: we already see progress in improving the view server 
>> situation. Once we get to a more desirable setup (yaynode/v8), we can 
>> improve the view server communication, there is no reason you’d need a 
>> single JS OS process per active replication and we should absolutely fix 
>> that.
>> 
>> --
>> 
>> Another angle is the replicator. I know Jason Smith has a prototype of this 
>> in Node, it works. Instead of maintaining N active replications, we just 
>> keep a maximum number of active connections and cycle out ones that are 
>> currently inactive. The DbUpdateNotification mechanism should make this 
>> relatively straightforward. There is added overhead for setting up and 
>> tearing down replications, but we can make better use of resources and not 
>> clog things with inactive replications. Especially in a db-per-user 
>> scenario, most replications don’t see much of an update most of the time, 
>> they should be inactive until data is written to any of the source 
>> databases. The mechanics in CouchDB are all there for this, we just need to 
>> write it.
>> 
>> --
>> 
>> Nate, thanks for sharing our findings and for bearing with us, despite your 
>> very understandable frustrations. It is people like you that allow us to 
>> make CouchDB better!
>> 
>> Best
>> Jan
>> --
>> 
>> 
>> 

Reply via email to