Efficient view design question

Jonathan Moss Mon, 27 Oct 2008 04:24:41 -0700

Greetings all,

I am currently writing a set of classes to handle php object model <-> CouchDB. The PHP objects are hierarchical and I have modelled this as essentially a doubly linked list. So that every document within DouchDB has a 'Children' array and a 'Parents' array. These arrays contain the Ids or related objects.


I already have a couple of map functions to retrieve children and parents:

"childrenOf": {

"map": "function(doc) {for(var idx in doc.Parents) {emit(doc.Parents[idx], doc);}}"

  },
  "parentsOf": {

"map": "function(doc) {for(var idx in doc.Children) {emit(doc.Children[idx], doc);}}"

These functions return whole documents. My understanding of views is that these views would have to be re-generated every time a document is added, removed or updated. If this is the case then when the number of documents in the database starts getting larger, the initial response time to retrieve one of these views would become considerable. In a small, system where writes are un-common and reads regular. This would not be an issue. However, I am struggling to find more than a handful of niche applications were this would be true. In almost all web application I have written, almost every request to the website will result in something (even if it is just tracking data) being written to the database. On a high volume website this would result in views having to be re-created almost constantly. Therefore efficient view design becomes paramount.

The view functions shown above return the whole doc. Which is know is in-efficient. In fact since I already have the document I want the children/parents of, I also already have all the child/parent IDs. Would it be much more efficient to simply retrieve the parent/child documents individually rather than having to re-generate views all the time?

As a side question - Having to re-generate views constantly in this kind of a situation could prove a real issue. I know that CouchDB is still pre-1.0 release and the developers are necessarily focusing on 'getting is right' before 'getting it fast' (to coin a phrase :) but will improvements in speed already on the roadmap make these worries moot except in very large databases or is it always going to be an issue and therefore require some clever application design? e.g. keeping frequently updated data in a traditional SQL DB and only keep rarely updated data in CouchDB, which would be a shame.


Thanks,
Jon

smime.p7s
Description: S/MIME Cryptographic Signature

Efficient view design question

Reply via email to