Re: Lazy Fulltext Search

Nils Adermann Wed, 16 Apr 2008 16:16:00 -0700

Hi,

Jan Lehnardt wrote:

Heya Søren,
On Apr 15, 2008, at 15:27, Soren Hilmer wrote:
I guess what all this boils down to is that:
When a database changes, you need to re-index all the views in the
fulltextsearch design document.
if you take this route. yes.
There are no way incremental changes can be made to the index as onedocument
change may potentially change more view results within the same view.
Right?
Yup.
Eventually, I think, we will be able to have CouchDB calculate theintersection of all FT hits and a view index for you. So the FTindexer will only need to index the whole DB and CouchDB filters outall matching documents that are not in the requested view for you. Fornow, you've got to do it yourself.

That's not even possible because a view (written in JS) could returndata not directly in a document. Either combining information frommultiple documents or generating new content based on some documentvalues. You would never be able to search such content.

On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:

On Apr 15, 2008, at 02:01, Nils Adermann wrote:

Hi,

I agree with Søren that this is not necessarily a good idea. It is
not trivial for an indexer to figure out which view results changed.
One method to so is storing all indexed view results and then
comparing them to the updated view once the indexer is called. This
is a needless waste of resources. Updating the view index based on
changed documents is even more difficult. You would have to
recompute the view at least partially to find out which view results
changed. Given the reduce step this means that any number of
documents, including unchanged ones could be involved. This creates
a lot of work.


Yeah, but it doesn't actually matter who does the work :) So we rather
keep that out of CouchDB.

Err I wasn't saying the question is where it takes place. I was sayingyou have to do the work twice instead of just once if we follow your way.

I think the problem we face here is different usage patterns of
views. There are views which process a lot of data and which are
based on documents that are updated frequently.  But they might only
be read from infrequently. These views profit from JIT computation.
However many applications use views which are infrequently updated
but often queried or searched. Such views benefit from live
updating. If an application allows searching data it nearly always
means that the data will be read more frequently than it is updated.
So in conclusion both methods (JIT and live updates) make sense for
views. But search normally only needs the live update mechanism. I
believe it should become configurable whether a view is updated
immediately after a change or only after a query takes place.
Fulltext search would always work on views with immediate updates.
The indexer would be notified about the changed results. On views
which delay updates, search would only work if the fulltext search
provides a mechanism to compare the new view results to the old ones.


Just query the view with ?count=0 to trigger an update after your
inserts and you have the synchronous update behaviour.

If we really do things your way that'd mean the entire database and allsearchable views need to be reindexed completely after every singleupdate. You're creating a huge amount of useless work for the indexer.


Cheers
Nils

Re: Lazy Fulltext Search

Reply via email to