Thank you for all the answers!
Benoit:
good idea, and I believe that is what Jan used here as well? I didn't
know about this before! Great sollution!
Kristopher:
>In the current situation, if I write 10 times and read 100 times,
the index may only be processed once if the 100 reads come after the
10 writes -- in a more real-world situation though, we're only >going
to be updating the index as many times as we write -- but keep in mind
that given the right circumstances, the index only has to be
regenerated once.
Thanks for your comments!
In my case it is actually more like 100 writes for each 10 reads... So
then having to wait for a substantial view update takes quite a bit of
time.
>Also, please realize that the index does not have to be /completely/
regenerated upon reindex -- only the documents that have been added/
modified.
That is why I thought regenerating the view for changed documents on
save wouldn't be that much of a performance hit... but it turns out I
was wrong. I didn't think of low level stuff like the byte layout
mentioned by Jan. In my case updating on save is still going to be the
best solution I believe, because I can't afford waiting long for view
updating when I am requesting the views from the front end.
>Lastly, I would recommend that you try to optimize your view code to
make things faster, as well.
This is also a good idea! Although my view code is really straight
forward. I have read all the wikis as well, but if you guys know of
any "best view practice" resource, or have tips for view optimizations
handy please let me know!
Cortland:
>Not sure why your views are timing out, but from my current
understanding views are incrementally updated with modifications but
only incrementally updated on a call to that view.
My bad for not being clear. The views themselves do not time out. The
web page generation for the end user times out because regenerating
the view takes so much time.
>Are you using the javascript spodermonkey viewserver(default) or
another one? Check the complexity of the view and possibly minimize
the view's complexity.
I am using the standard built in spidermonkey view server. I believe
the views to be pretty clean, but then again I have a lot to learn
about best practices for document based databases!
>I'm not sure, but I think the pattern here is you put views most
likely to be called near to each other in the design document, say
blog summaries view followed by full content view, and have less
>related views in a different design document, say for a list of
authors or a tag list.
Smart approach. I had currently grouped my views by the datatypes they
contain... Different views for feeds in one design documents and views
for feed entries in another and one design document for users and so
on... I'll see how much I can change it around for the better!
Jan:
Thanks for all the comments!
I didn't know about the DbUpdateNotificationProcess functionality
before you and Benoit mentioned it! Just what I need!
#!/bin/sh
counter=0
max_docs=100
while true
do
read database
counter=`expr $counter + 1`
if [ $counter -ge $max_docs ]; then
`curl http://server:5984/$database/_view/name?count=0`
counter=0
fi
done
I am wondering, why is there a loop here? Isn't the shell script
called once every time the database receives an update? That is what I
can read out of the documentation for the DbUpdateNotificationProcess
in the wiki...
Couldn't it just as well be written like this?
#!/bin/sh
read database
`curl http://server:5984/$database/_view/one_for_each/view?count=0`
`curl http://server:5984/$database/_view/other_view/jadijadd?count=0`
What is it I am not understanding?
Thanks for all the answers and your time!
Best regards
Sebastian
On Apr 28, 2008, at 5:22 AM, Jan Lehnardt wrote:
Heya Sebastian,
it seems you feel rather strongly about this issue. But that's
nothing a little engineering can solve for you, read on :)
On Apr 28, 2008, at 01:04, Guby wrote:
Hello dear Couchers
I understand that the views are indexed the first time they are
accessed and as far as I know there is no way to turn on view
updating on document save. I really don't understand the reasoning
behind this behavior. The advantage of the pre-populated/indexed
views are that they are blazingly fast to query and access, but
that advantage disappears when the first request after a document
update has to regenerate the view first!
I am currently building a web app where the background processes
perform a lot of writes to the database. The time it takes to write
a document is not critical for me. What is critical though is the
time it takes to load web pages for the end user that require
content from the database.
In some situations the background processes add thousands of
documents to the database within a short period of time, and when
the user tries to access a page after such an update the view
querying sometimes takes minutes and as a consequence of that the
browser times out... Not a recipe for happy customers...
The only solution I can see at the moment is to create a worker
that queries the database whenever it is told that there has been a
document update, but that seems really stupid and unnecessary. And
in my case, running on a smallish VPS, a big waste of resources
having an extra working doing something the database itself could
just as well have done. It also requires a lot of extra coding
notifying the worker whenever I update or create a document
throughout my app.
That would be a rather extreme solution. Why not, for
example, trigger a view update from your document-
insertion code, every N (N = 10, 30, 60?) seconds?
I am sure you have reasons for having implemented the views the way
you have, but I would be really interested to hear why it has been
done this way!
1) To not have a 'write penalty' for all views when
documents are added. We expect you to have
quite a few of views and updating all of them on-write
seems silly. The data is generated when needed,
saving resources by 2) not clogging them up when
needed elsewhere and 3) processing large quantities
of data in batches. and finally 4) The very layout of the
bytes that make up documents on disk and the way they
are read are optimised for super-fast index creation. This
is expected to be a common operation. I still understand
that this leaves things to be desired for you.
My wishes are for an optional updating of views on save feature! In
some cases that might regenerate a view several times without it
actually being accessed in between, but that is a tradeoff I can
live with, slow views on the other hand is something I can not!
Put this in a shell script called view_trigger.sh
#!/bin/sh
counter=0
max_docs=100
while true
do
read database
counter=`expr $counter + 1`
if [ $counter -ge $max_docs ]; then
`curl http://server:5984/$database/_view/name?count=0`
counter=0
fi
done
and add view_trigger.sh to our couch.ini as a
DbUpdateNotificationProcess
voilá :)
Yes, this is extra work externally, but this is still a sensible
solution. From our perspective, we do not need to change
the core server behaviour to get you what you need and
you still benefit from the batching of index creation.
Also, I'd like to second what Cortland said: All views in a
design document get updated if you query one of them.
Be aware of that :)
And on a final note: Thanks for writing in. Don't be
discouraged by the replies. If there are other things that
you would love to see in CouchDB, please let us know.
Also, if enough users request a feature, we will consider
putting it in, even on-