[
https://issues.apache.org/jira/browse/COUCHDB-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799586#action_12799586
]
Brian Candler commented on COUCHDB-620:
---------------------------------------
Like Paul says: I am not proposing any change to the couchjs protocol, nor to
allow out-of-order returning of responses from couchjs.
Just this: that the core takes the next 3 (say) documents to be processed,
stuffs them down the socket to couchjs, then sends another one each time a
whole document response is received.
The couchjs view server is completely unaware of this, since it runs lock-step
(read a request, emit response, read request, emit response). It's just that
when it next comes to read a request, there will be one waiting for it already.
The same as HTTP pipelining, in other words.
I don't see any particular problem with error handling. If you've not received
a complete response for document X, then you don't update the view pointer so
you'll try again next time.
> Generating views is extremely slow - makes CouchDB hard to use with
> non-trivial number of docs
> ----------------------------------------------------------------------------------------------
>
> Key: COUCHDB-620
> URL: https://issues.apache.org/jira/browse/COUCHDB-620
> Project: CouchDB
> Issue Type: Improvement
> Components: Infrastructure
> Affects Versions: 0.10
> Environment: Ubuntu 9.10 64 bit, CouchDB 0.10
> Reporter: Roger Binns
> Assignee: Damien Katz
> Attachments: pipelining.jpg
>
>
> Generating views is extremely slow. For example adding 10 million documents
> takes less than 10 minutes but generating some simple views on the same docs
> takes over 4 hours.
> Using top you can see that CouchDB (erlang) and couchjs between them cannot
> even saturate a single CPU let alone the I/O system. Under ideal conditions
> performance should be limited by cpu, disk or memory. This implies that the
> processes are doing simple things in lockstep accumulating latencies in each
> process as well as the communication between them which when multiplied by
> the number of documents can amount to a lot.
> Some suggestions:
> * Run as many couchjs instances as there are processor cores and scatter work
> amongst them
> * Have some sort of pipelining in the erlang so that the moment the first
> byte of response is received from couchjs the data is sent for the next
> request (the JSON conversion, HTTP headers etc should all have been assembled
> already) to reduce latencies. Do whatever is most similar in couchjs (eg use
> separate threads to read requests, process them and write responses).
> * Use the equivalent of HTTP pipelining when talking to couchjs so that it
> always has a doc ready to work on rather than having to transmit an entire
> response and then wait for erlang to think and provide an entire new request
> A simple test of success is to have a database with a million or so documents
> with a trivial view and have view creation max out the CPU,. memory or disk.
> Some things in CouchDB make this a particularly nasty problem. View data is
> not replicated so replicating documents can lead the view data by a large
> margin on the recipient database. This can lead to inconsistencies. You
> also can't expect users to then wait minutes (or hours) for a request to
> complete because the view generation got that far behind. (My own plans now
> are to not use replication and instead create the database file on another
> couchdb instance and then rsync the binary database file over instead!)
> Although stale=ok is available, you still have no idea if the response will
> be quick or take however long view generation does. (Sure I could add some
> sort of timeout and complicate the code but then what value do I pick? If I
> have a user waiting I want an answer ASAP or I have to give them some
> horrible error message. Taking a long wait and then giving a timeout is even
> worse!)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.