On Jun 28, 2008, at 08:03, David King wrote:
I'm trying to gain a fundamental understanding of views and indexed
data. If this is documented in a FAQ, please direct me there
instead :)
In trying to map my understanding from SQL,
Here we have to tackle the first issue: Do not try to map what you know
from SQL to CouchDB. Try to independently understand, how CouchDB
works and then try to apply your problems to it. A translation will not
work and possibly leave you thinking CouchDB is crap because it is not
an RDBMS which is surely not the case. On the other hand, it might
be perfectly possible that CouchDB is not the right tool for your job,
but it is certainly cool that you are checking it out :)
it appears that the answer to quickly querying data is by pre-
calculating query result-sets and storing them in tables, called
views. A view is table populated by a function that runs against
every object that is written or modified in the database.
1. How would you implement a query against a value that changes
after the view is populated, like the current time? That is, if I
wanted things younger than a week, a permanent view like this:
function(doc) {
if(doc.date > now() - timeinterval('1 week')) {
emit(null,doc);
}
}
(date-syntax liberally made up) the results of that query, if
populated when the data is changed, would quickly be invalid,
because now() has changed. Is this accurate? How would you
performantly run a query like this?
Your map functions must return the same result for the same input, so
things like now() can not be used. And you usually don't. The most
interesting feature of the result set (or table as you call it) of the
map
function is that the 'first column', the 'key' can be used for fast
lookups.
So what you would do here instead, is:
function(doc) {
emit(doc.date, null);
}
and query with /db/_view/date/name?startkey=timestamp_from_interval('1
week')&endkey=now()
Looking up this can be done in constant time.
2. Same question for a permanent view containing the youngest 10
items (this one might be easier)?
Same thing. I note that you explicitly mention permanent views. Do not
use temporary views in production, only during development.
3. The wiki doesn't mention parameterised views. So if I have a
document with an 'author' field, and I want a view such that I can
see everything that a given author wrote, do I need a view per
author? Given thousands of authors, what is the performance cost for
running a document through a few thousand author-functions?
Same as above:
function(doc) {
emit(doc.author, null);
}
GET /db/_view/authors/name?key=authorname
One view, extremely fast lookups.
4. I know that the distribution bits are still being fleshed out,
but is it the intention that eventually views can be stored or
calculated on a separate server from the data (since they are
implemented as tables)?
Not sure what you mean with 'since they are implemented as tables', but
maybe that is just the SQL-lingua that is confusing me. We don't have
tables (things might look like them, though). But yes, eventually, you
will
be able to distribute view creation. We haven't gotten around to to
that yet.
Feel free to send in more questions as they come :-)
Cheers
Jan
--