On 11 Apr 2009, at 04:31, kowsik wrote:

Parallel == multiple-threads across multiple-machines in the cluster? :-)

By definition, temp views don't have no disk IO.

They must get data to process from somewhere :)


It's map/reduce
parallelized in memory directly served back over a TCP socket. Is that
still not going to be fast enough?

A common fallacy with CouchDB's Map/Reduce is thinking that doing things
on multiple nodes is magically faster.

The sweet-spot for Map/Reduce is heavy computation on small bits of
distributed data. CouchDB's views are the opposite: Little computation
of huge amounts of data. Unless your data is already distributed across
participating nodes, distributed M/R is not going to make anything faster.

With upcoming clustering, you get partial data distribution and parallel
execution, but that doesn't mean that anything has to change in the
current view server code. (It has other areas that are open for speed
improvements).

Cheers
Jan
--


K.

On Fri, Apr 10, 2009 at 7:26 PM, Paul Davis <[email protected] > wrote:
On Fri, Apr 10, 2009 at 8:51 PM, kowsik <[email protected]> wrote:
IMHO, the need for view intersections will go away once we have
parallel map/reduce to the point where _temp_views's are fast!

K.


The lower bound for view generation is disk I/O. Temp views will never
be fast enough for production.

HTH,
Paul Davis

On Fri, Apr 10, 2009 at 10:04 AM, Wout Mertens <[email protected] > wrote:

On Apr 10, 2009, at 11:46 AM, Sho Fukamachi wrote:

the obvious followup question to those examples is "well, how do I find a
document with all of (n) tags?".

How about this algorithm. Needed: tagcount view and document-by- tag view

- given a list of tags that the document should have
- find the tag that has the lowest document count with the tagcount view - request all documents with that tag through the document-by-tag view
- filter manually on documents that match

If that would mean too many documents, make a view that emits all
combinations of 2 tags a document has, that way you filter by that much
more.

It would be neat if one could post a temporary view that runs against a subset of the output of a real view. That way the viewserver farm could do
the filtering...

Wout.





Reply via email to