Lets have this discussion on the dev mailing list. That's what it's for. -Damien
On Jan 10, 2010, at 9:27 PM, Roger Binns (JIRA) wrote: > Generating views is extremely slow - makes CouchDB hard to use with > non-trivial number of docs > ---------------------------------------------------------------------------------------------- > > Key: COUCHDB-620 > URL: https://issues.apache.org/jira/browse/COUCHDB-620 > Project: CouchDB > Issue Type: Improvement > Components: Infrastructure > Affects Versions: 0.10 > Environment: Ubuntu 9.10 64 bit, CouchDB 0.10 > Reporter: Roger Binns > > > Generating views is extremely slow. For example adding 10 million documents > takes less than 10 minutes but generating some simple views on the same docs > takes over 4 hours. > > Using top you can see that CouchDB (erlang) and couchjs between them cannot > even saturate a single CPU let alone the I/O system. Under ideal conditions > performance should be limited by cpu, disk or memory. This implies that the > processes are doing simple things in lockstep accumulating latencies in each > process as well as the communication between them which when multiplied by > the number of documents can amount to a lot. > > Some suggestions: > > * Run as many couchjs instances as there are processor cores and scatter work > amongst them > > * Have some sort of pipelining in the erlang so that the moment the first > byte of response is received from couchjs the data is sent for the next > request (the JSON conversion, HTTP headers etc should all have been assembled > already) to reduce latencies. Do whatever is most similar in couchjs (eg use > separate threads to read requests, process them and write responses). > > * Use the equivalent of HTTP pipelining when talking to couchjs so that it > always has a doc ready to work on rather than having to transmit an entire > response and then wait for erlang to think and provide an entire new request > > A simple test of success is to have a database with a million or so documents > with a trivial view and have view creation max out the CPU,. memory or disk. > > Some things in CouchDB make this a particularly nasty problem. View data is > not replicated so replicating documents can lead the view data by a large > margin on the recipient database. This can lead to inconsistencies. You > also can't expect users to then wait minutes (or hours) for a request to > complete because the view generation got that far behind. (My own plans now > are to not use replication and instead create the database file on another > couchdb instance and then rsync the binary database file over instead!) > > Although stale=ok is available, you still have no idea if the response will > be quick or take however long view generation does. (Sure I could add some > sort of timeout and complicate the code but then what value do I pick? If I > have a user waiting I want an answer ASAP or I have to give them some > horrible error message. Taking a long wait and then giving a timeout is even > worse!) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. >
