Re: Updating views on save

Guby Mon, 28 Apr 2008 06:33:13 -0700

Thank you for all the answers!

Benoit:

good idea, and I believe that is what Jan used here as well? I didn'tknow about this before! Great sollution!



Kristopher:

>In the current situation, if I write 10 times and read 100 times,the index may only be processed once if the 100 reads come after the10 writes -- in a more real-world situation though, we're only >goingto be updating the index as many times as we write -- but keep in mindthat given the right circumstances, the index only has to beregenerated once.


Thanks for your comments!

In my case it is actually more like 100 writes for each 10 reads... Sothen having to wait for a substantial view update takes quite a bit oftime.

>Also, please realize that the index does not have to be /completely/regenerated upon reindex -- only the documents that have been added/modified.

That is why I thought regenerating the view for changed documents onsave wouldn't be that much of a performance hit... but it turns out Iwas wrong. I didn't think of low level stuff like the byte layoutmentioned by Jan. In my case updating on save is still going to be thebest solution I believe, because I can't afford waiting long for viewupdating when I am requesting the views from the front end.

>Lastly, I would recommend that you try to optimize your view code tomake things faster, as well.

This is also a good idea! Although my view code is really straightforward. I have read all the wikis as well, but if you guys know ofany "best view practice" resource, or have tips for view optimizationshandy please let me know!



Cortland:

>Not sure why your views are timing out, but from my currentunderstanding views are incrementally updated with modifications butonly incrementally updated on a call to that view.

My bad for not being clear. The views themselves do not time out. Theweb page generation for the end user times out because regeneratingthe view takes so much time.

>Are you using the javascript spodermonkey viewserver(default) oranother one? Check the complexity of the view and possibly minimizethe view's complexity.

I am using the standard built in spidermonkey view server. I believethe views to be pretty clean, but then again I have a lot to learnabout best practices for document based databases!

>I'm not sure, but I think the pattern here is you put views mostlikely to be called near to each other in the design document, sayblog summaries view followed by full content view, and have less>related views in a different design document, say for a list ofauthors or a tag list.

Smart approach. I had currently grouped my views by the datatypes theycontain... Different views for feeds in one design documents and viewsfor feed entries in another and one design document for users and soon... I'll see how much I can change it around for the better!



Jan:
Thanks for all the comments!

I didn't know about the DbUpdateNotificationProcess functionalitybefore you and Benoit mentioned it! Just what I need!

#!/bin/sh

counter=0
max_docs=100

while true
        do
        read database

        counter=`expr $counter + 1`

        if [ $counter -ge $max_docs ]; then
                `curl http://server:5984/$database/_view/name?count=0`
                counter=0
        fi
done

I am wondering, why is there a loop here? Isn't the shell scriptcalled once every time the database receives an update? That is what Ican read out of the documentation for the DbUpdateNotificationProcessin the wiki...

Couldn't it just as well be written like this?

#!/bin/sh

read database

`curl http://server:5984/$database/_view/one_for_each/view?count=0`
`curl http://server:5984/$database/_view/other_view/jadijadd?count=0`


What is it I am not understanding?

Thanks for all the answers and your time!


Best regards
Sebastian










On Apr 28, 2008, at 5:22 AM, Jan Lehnardt wrote:

Heya Sebastian,
it seems you feel rather strongly about this issue. But that's
nothing a little engineering can solve for you, read on :)

On Apr 28, 2008, at 01:04, Guby wrote:
Hello dear Couchers
I understand that the views are indexed the first time they areaccessed and as far as I know there is no way to turn on viewupdating on document save. I really don't understand the reasoningbehind this behavior. The advantage of the pre-populated/indexedviews are that they are blazingly fast to query and access, butthat advantage disappears when the first request after a documentupdate has to regenerate the view first!I am currently building a web app where the background processesperform a lot of writes to the database. The time it takes to writea document is not critical for me. What is critical though is thetime it takes to load web pages for the end user that requirecontent from the database.In some situations the background processes add thousands ofdocuments to the database within a short period of time, and whenthe user tries to access a page after such an update the viewquerying sometimes takes minutes and as a consequence of that thebrowser times out... Not a recipe for happy customers...
The only solution I can see at the moment is to create a workerthat queries the database whenever it is told that there has been adocument update, but that seems really stupid and unnecessary. Andin my case, running on a smallish VPS, a big waste of resourceshaving an extra working doing something the database itself couldjust as well have done. It also requires a lot of extra codingnotifying the worker whenever I update or create a documentthroughout my app.
That would be a rather extreme solution. Why not, for
example, trigger a view update from your document-
insertion code, every N (N = 10, 30, 60?) seconds?
I am sure you have reasons for having implemented the views the wayyou have, but I would be really interested to hear why it has beendone this way!
1) To not have a 'write penalty' for all views when
documents are added. We expect you to have
quite a few of views and updating all of them on-write
seems silly. The data is generated when needed,
saving resources by 2) not clogging them up when
needed elsewhere and 3) processing large quantities
of data in batches. and finally 4) The very layout of the
bytes that make up documents on disk and the way they
are read are optimised for super-fast index creation. This
is expected to be a common operation. I still understand
that this leaves things to be desired for you.
My wishes are for an optional updating of views on save feature! Insome cases that might regenerate a view several times without itactually being accessed in between, but that is a tradeoff I canlive with, slow views on the other hand is something I can not!
Put this in a shell script called view_trigger.sh

#!/bin/sh

counter=0
max_docs=100

while true
        do
        read database

        counter=`expr $counter + 1`

        if [ $counter -ge $max_docs ]; then
                `curl http://server:5984/$database/_view/name?count=0`
                counter=0
        fi
done
and add view_trigger.sh to our couch.ini as aDbUpdateNotificationProcess
voilá :)

Yes, this is extra work externally, but this is still a sensible
solution. From our perspective, we do not need to change
the core server behaviour to get you what you need and
you still benefit from the batching of index creation.

Also, I'd like to second what Cortland said: All views in a
design document get updated if you query one of them.
Be aware of that :)

And on a final note: Thanks for writing in. Don't be
discouraged by the replies. If there are other things that
you would love to see in CouchDB, please let us know.

Also, if enough users request a feature, we will consider
putting it in, even on-

Re: Updating views on save

Reply via email to