On 15 May 2009, at 09:38, Brian Candler wrote:

On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote:
The proposal would exclude a document from *all* views in a particular design doc. So you're only going to get a benefit from this if you have a large number of documents (or a number of large documents) which are not
required to be indexed in any view in that design doc.

Yep - and that is the point.  Consider Jan's example, where it was
filtering on doc['type']. If a database had (say) 10 potential values of 'type', then all filters that only care about a single type will only
care about 1 in 10 of those documents.

Sure, as long as *none* of the views in that design document care about a
significant proportion of the documents.

It's unusual that people will have docs which are completely unindexed, so I think this patch mainly helps in the case where the user has 10 separate design documents, each of which is only interested in documents of one type.

Of course, that's a perfectly legitimate way of using CouchDB, and I don't
oppose this change at all.

It might be possible to make the feature more general though. For example, suppose each view had its own filter, and the erlang server took the *union* of those filters to work out which documents to send. Then, when sending a document, it sent a list of which views to process it with. This could be used to simplify the view code by removing the doc.type test, whilst getting
the performance benefit automatically.

Like I said in the original mail. This wouldn't be possible without a major rewrite of the view serverand I'd rather not do that in the light of other, more important
changes.

Cheers
Jan
--



Example:

 views:{
   view1:{
     filter:[{type:"foo"}],
     map:...
   }
   view2:{
     filter:[{type:"foo"},{type:"bar"}],
     map:...
   }
 }

When a document of type foo is sent, it would be sent to the view engine with a list ["view1","view2"] of the views to be invoked on it. A document of type bar would have ["view2"]. A document of type baz would not be sent
at all.

But maybe this is too complicated, and going further down this route ends up
with an erlang view server anyway.

Taking this to its extreme, we tested Jan's patch on a view which
matches very few document in a large database.  Rebuilding that view
with a filter was 18 times faster than without the filter. We put this down to the fact the filter managed to avoid the json encode/decode step
for the vast majority of the docs in the database.

You also avoided sending the docs over the socket and waiting for the
response. So maybe latency is also part of the problem. Depends whether the
view server interface does any sort of pipelining of requests.

Regards,

Brian.


Reply via email to