On Mon, Jan 11, 2010 at 12:38 PM, Damien Katz (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/COUCHDB-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Damien Katz closed COUCHDB-620. > ------------------------------- > > Resolution: Invalid > Assignee: Damien Katz > > Closing as invalid this as it has no objective criteria for ever being > closed, and the current trunk already implements most of the suggestions > proposed. >
+1 however, you could open a ticket for this: > * Run as many couchjs instances as there are processor cores and scatter work > amongst them what I'd really like to see is a solid test-harness that can verify. and also a benchmark to prove it helps in practice. this ticket would be a good intro to CouchDB, for an experienced Erlanger, I'm guessing. Someone who has a grasp of the supervisor tree etc. >> ---------------------------------------------------------------------------------------------- >> >> Key: COUCHDB-620 >> URL: https://issues.apache.org/jira/browse/COUCHDB-620 >> Project: CouchDB >> Issue Type: Improvement >> Components: Infrastructure >> Affects Versions: 0.10 >> Environment: Ubuntu 9.10 64 bit, CouchDB 0.10 >> Reporter: Roger Binns >> Assignee: Damien Katz >> >> Generating views is extremely slow. For example adding 10 million documents >> takes less than 10 minutes but generating some simple views on the same docs >> takes over 4 hours. >> Using top you can see that CouchDB (erlang) and couchjs between them cannot >> even saturate a single CPU let alone the I/O system. Under ideal conditions >> performance should be limited by cpu, disk or memory. This implies that the >> processes are doing simple things in lockstep accumulating latencies in each >> process as well as the communication between them which when multiplied by >> the number of documents can amount to a lot. >> Some suggestions: >> * Run as many couchjs instances as there are processor cores and scatter >> work amongst them >> * Have some sort of pipelining in the erlang so that the moment the first >> byte of response is received from couchjs the data is sent for the next >> request (the JSON conversion, HTTP headers etc should all have been >> assembled already) to reduce latencies. Do whatever is most similar in >> couchjs (eg use separate threads to read requests, process them and write >> responses). >> * Use the equivalent of HTTP pipelining when talking to couchjs so that it >> always has a doc ready to work on rather than having to transmit an entire >> response and then wait for erlang to think and provide an entire new request >> A simple test of success is to have a database with a million or so >> documents with a trivial view and have view creation max out the CPU,. >> memory or disk. >> Some things in CouchDB make this a particularly nasty problem. View data is >> not replicated so replicating documents can lead the view data by a large >> margin on the recipient database. This can lead to inconsistencies. You >> also can't expect users to then wait minutes (or hours) for a request to >> complete because the view generation got that far behind. (My own plans now >> are to not use replication and instead create the database file on another >> couchdb instance and then rsync the binary database file over instead!) >> Although stale=ok is available, you still have no idea if the response will >> be quick or take however long view generation does. (Sure I could add some >> sort of timeout and complicate the code but then what value do I pick? If I >> have a user waiting I want an answer ASAP or I have to give them some >> horrible error message. Taking a long wait and then giving a timeout is >> even worse!) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- Chris Anderson http://jchrisa.net http://couch.io
