+1 on round-robin continuous replication. Ideally it'd allow replications to specify a relative priority, e.g. replication of log alerts or chat messages might desire lower latency (higher priority) than a ddoc deployment or user backup. For now I'm going to just implement my own duct tape version of this, using cron jobs to trigger non-continuous replications.
FWIW, I'm sharing with my client's permission the script I've been using to load test continuous filtered replication to/from a central master: https://gist.github.com/natevw/4711127 The test script sets up N+1 databases, writes documents to 1 as the master while replicating to the other N as well as "short-polling" the _changes to kinda simulate general load on top of the longpolling the application does. On OS X I can only get to around 250 users due to some FD_SETSIZE stuff with Erlang, but it remains stable if I keep it under that limit — however, it takes the user database replications a *long* time to all get caught up (some don't even start until a few minutes after the changes stop). hth, -natevw On Feb 4, 2013, at 2:50 PM, Robert Newson wrote: > I had a mind to teach the _replicator db this trick. Since we have a > record of everything we need to resume a replication there's no reason > for a one-to-one correspondence between a _replicator doc and a > replicator process. We can simply run N of them for a bit (say, a > batch of 1000 updates) and then switch to others. The internal > db_updated mechanism is a good way to notice when we might have > updates worth sending but it's only half the story. A round-robin over > all _replicator docs (other than one-shot ones, of course) seems a > really neat trick to me. > > B. > > On 4 February 2013 22:39, Jan Lehnardt <[email protected]> wrote: >> >> On Feb 4, 2013, at 23:14 , Nathan Vander Wilt <[email protected]> wrote: >> >>> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote: >>>> So I've heard from both hosting providers that it is fine, but also >>>> managed to take both of their shared services "down" with only about ~100 >>>> users (200 continuous filtered replications). I'm only now at the point >>>> where I have tooling to build out arbitrary large tests on my local >>>> machine to see the stats for myself, but as I understand it the issue is >>>> that every replication needs at least one couchjs process to do its >>>> filtering for it. >>>> >>>> So rather than inactive users mostly just taking up disk space, they're >>>> instead costing a full-fledged process worth of memory and system >>>> resources, each, all the time. As I understand it, this isn't much better >>>> on BigCouch either since the data is scattered ± evenly on each machine, >>>> so while the *computation* is spread, each node in the cluster still needs >>>> k*numberOfUsers couchjs processes running. So it's "scalable" in the sense >>>> that traditional databases are scalable: vertically, by buying machines >>>> with more and more memory. >>>> >>>> Again, I am still working on getting a better feel for the costs involved, >>>> but the basic theory with a master-to-N hub is not a great start: every >>>> change needs to be processed by every N replications. So if a user writes >>>> a document that ends up in the master database, every other user's filter >>>> function needs to process that change coming out of master. Even when N >>>> users are generating 0 (instead of M) changes, it's not doing M*N work but >>>> there's still always 2*N open connections and supporting processes >>>> providing a nasty baseline for large values of N. >>> >>> Looks like I was wrong about needing enough RAM for one couchjs process per >>> replication. >>> >>> CouchDB maintains a pool of (no more than >>> query_server_config/os_process_limit) couchjs processes and work is divvied >>> out amongst these as necessary. I found a little meta-discussion of this >>> system at https://issues.apache.org/jira/browse/COUCHDB-1375 and the code >>> uses it here >>> https://github.com/apache/couchdb/blob/master/src/couchdb/couch_query_servers.erl#L299 >>> >>> On my laptop, I was able to spin up 250 users without issue. Beyond that, I >>> start running into ± hardcoded system resource limits that Erlang has under >>> Mac OS X but from what I've seen the only theoretical scalability issue >>> with going beyond that on Linux/Windows would be response times, as the >>> worker processes become more and more saturated. >>> >>> It still seems wise to implement tiered replications for communicating >>> between thousands of *active* user databases, but that seems reasonable to >>> me. >> >> CouchDB’s design is obviously lacking here. >> >> For immediate relief, I’ll propose the usual jackhammer of unpopular >> responses: write your filters in Erlang. (sorry :) >> >> For the future: we already see progress in improving the view server >> situation. Once we get to a more desirable setup (yaynode/v8), we can >> improve the view server communication, there is no reason you’d need a >> single JS OS process per active replication and we should absolutely fix >> that. >> >> -- >> >> Another angle is the replicator. I know Jason Smith has a prototype of this >> in Node, it works. Instead of maintaining N active replications, we just >> keep a maximum number of active connections and cycle out ones that are >> currently inactive. The DbUpdateNotification mechanism should make this >> relatively straightforward. There is added overhead for setting up and >> tearing down replications, but we can make better use of resources and not >> clog things with inactive replications. Especially in a db-per-user >> scenario, most replications don’t see much of an update most of the time, >> they should be inactive until data is written to any of the source >> databases. The mechanics in CouchDB are all there for this, we just need to >> write it. >> >> -- >> >> Nate, thanks for sharing our findings and for bearing with us, despite your >> very understandable frustrations. It is people like you that allow us to >> make CouchDB better! >> >> Best >> Jan >> -- >> >> >>
