On 15 May 2009, at 09:38, Brian Candler wrote:
On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote:
The proposal would exclude a document from *all* views in a
particular
design doc. So you're only going to get a benefit from this if you
have a
large number of documents (or a number of large documents) which
are not
required to be indexed in any view in that design doc.
Yep - and that is the point. Consider Jan's example, where it was
filtering on doc['type']. If a database had (say) 10 potential
values
of 'type', then all filters that only care about a single type will
only
care about 1 in 10 of those documents.
Sure, as long as *none* of the views in that design document care
about a
significant proportion of the documents.
It's unusual that people will have docs which are completely
unindexed, so I
think this patch mainly helps in the case where the user has 10
separate
design documents, each of which is only interested in documents of
one type.
Of course, that's a perfectly legitimate way of using CouchDB, and I
don't
oppose this change at all.
It might be possible to make the feature more general though. For
example,
suppose each view had its own filter, and the erlang server took the
*union*
of those filters to work out which documents to send. Then, when
sending a
document, it sent a list of which views to process it with. This
could be
used to simplify the view code by removing the doc.type test, whilst
getting
the performance benefit automatically.
Like I said in the original mail. This wouldn't be possible without a
major rewrite
of the view serverand I'd rather not do that in the light of other,
more important
changes.
Cheers
Jan
--
Example:
views:{
view1:{
filter:[{type:"foo"}],
map:...
}
view2:{
filter:[{type:"foo"},{type:"bar"}],
map:...
}
}
When a document of type foo is sent, it would be sent to the view
engine
with a list ["view1","view2"] of the views to be invoked on it. A
document
of type bar would have ["view2"]. A document of type baz would not
be sent
at all.
But maybe this is too complicated, and going further down this route
ends up
with an erlang view server anyway.
Taking this to its extreme, we tested Jan's patch on a view which
matches very few document in a large database. Rebuilding that view
with a filter was 18 times faster than without the filter. We put
this
down to the fact the filter managed to avoid the json encode/decode
step
for the vast majority of the docs in the database.
You also avoided sending the docs over the socket and waiting for the
response. So maybe latency is also part of the problem. Depends
whether the
view server interface does any sort of pipelining of requests.
Regards,
Brian.