Re: [jira] Closed: (COUCHDB-620) Generating views is extremely slow - makes CouchDB hard to use with non-trivial number of docs

Chris Anderson Mon, 11 Jan 2010 12:50:33 -0800

On Mon, Jan 11, 2010 at 12:38 PM, Damien Katz (JIRA) <[email protected]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/COUCHDB-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Damien Katz closed COUCHDB-620.
> -------------------------------
>
>    Resolution: Invalid
>      Assignee: Damien Katz
>
> Closing as invalid this as it has no objective criteria for ever being 
> closed, and the current trunk already implements most of the suggestions 
> proposed.
>


+1

however,  you could open a ticket for this:

> * Run as many couchjs instances as there are processor cores and scatter work 
> amongst them

what I'd really like to see is a solid test-harness that can verify.
and also a benchmark to prove it helps in practice.

this ticket would be a good intro to CouchDB, for an experienced
Erlanger, I'm guessing. Someone who has a grasp of the supervisor tree
etc.


>> ----------------------------------------------------------------------------------------------
>>
>>                 Key: COUCHDB-620
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-620
>>             Project: CouchDB
>>          Issue Type: Improvement
>>          Components: Infrastructure
>>    Affects Versions: 0.10
>>         Environment: Ubuntu 9.10 64 bit, CouchDB 0.10
>>            Reporter: Roger Binns
>>            Assignee: Damien Katz
>>
>> Generating views is extremely slow.  For example adding 10 million documents 
>> takes less than 10 minutes but generating some simple views on the same docs 
>> takes over 4 hours.
>> Using top you can see that CouchDB (erlang) and couchjs between them cannot 
>> even saturate a single CPU let alone the I/O system.  Under ideal conditions 
>> performance should be limited by cpu, disk or memory.  This implies that the 
>> processes are doing simple things in lockstep accumulating latencies in each 
>> process as well as the communication between them which when multiplied by 
>> the number of documents can amount to a lot.
>> Some suggestions:
>> * Run as many couchjs instances as there are processor cores and scatter 
>> work amongst them
>> * Have some sort of pipelining in the erlang so that the moment the first 
>> byte of response is received from couchjs the data is sent for the next 
>> request (the JSON conversion, HTTP headers etc should all have been 
>> assembled already) to reduce latencies.  Do whatever is most similar in 
>> couchjs (eg use separate threads to read requests, process them and write 
>> responses).
>> * Use the equivalent of HTTP pipelining when talking to couchjs so that it 
>> always has a doc ready to work on rather than having to transmit an entire 
>> response and then wait for erlang to think and provide an entire new request
>> A simple test of success is to have a database with a million or so 
>> documents with a trivial view and have view creation max out the CPU,. 
>> memory or disk.
>> Some things in CouchDB make this a particularly nasty problem.  View data is 
>> not replicated so replicating documents can lead the view data by a large 
>> margin on the recipient database.  This can lead to inconsistencies.  You 
>> also can't expect users to then wait minutes (or hours) for a request to 
>> complete because the view generation got that far behind.  (My own plans now 
>> are to not use replication and instead create the database file on another 
>> couchdb instance and then rsync the binary database file over instead!)
>> Although stale=ok is available, you still have no idea if the response will 
>> be quick or take however long view generation does.  (Sure I could add some 
>> sort of timeout and complicate the code but then what value do I pick?  If I 
>> have a user waiting I want an answer ASAP or I have to give them some 
>> horrible error message.  Taking a long wait and then giving a timeout is 
>> even worse!)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: [jira] Closed: (COUCHDB-620) Generating views is extremely slow - makes CouchDB hard to use with non-trivial number of docs

Reply via email to