Re: Lazy Fulltext Search

Jan Lehnardt Tue, 15 Apr 2008 05:06:48 -0700


On Apr 15, 2008, at 02:01, Nils Adermann wrote:

Hi,
I agree with Søren that this is not necessarily a good idea. It isnot trivial for an indexer to figure out which view results changed.One method to so is storing all indexed view results and thencomparing them to the updated view once the indexer is called. Thisis a needless waste of resources. Updating the view index based onchanged documents is even more difficult. You would have torecompute the view at least partially to find out which view resultschanged. Given the reduce step this means that any number ofdocuments, including unchanged ones could be involved. This createsa lot of work.

Yeah, but it doesn't actually matter who does the work :) So we ratherkeep that out of CouchDB.

I think the problem we face here is different usage patterns ofviews. There are views which process a lot of data and which arebased on documents that are updated frequently. But they might onlybe read from infrequently. These views profit from JIT computation.However many applications use views which are infrequently updatedbut often queried or searched. Such views benefit from liveupdating. If an application allows searching data it nearly alwaysmeans that the data will be read more frequently than it is updated.So in conclusion both methods (JIT and live updates) make sense forviews. But search normally only needs the live update mechanism. Ibelieve it should become configurable whether a view is updatedimmediately after a change or only after a query takes place.Fulltext search would always work on views with immediate updates.The indexer would be notified about the changed results. On viewswhich delay updates, search would only work if the fulltext searchprovides a mechanism to compare the new view results to the old ones.

Just query the view with ?count=0 to trigger an update after yourinserts and you have the synchronous update behaviour.

Cheers
Nils

Jan Lehnardt wrote:
On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
Hi
Have you read Chris' response about letting the view engine callthe indexer,as it has the information needed for the indexer? As I understandthe idea,
it will essentially keep the fulltext indexer and the views in sync.
I like this idea and I believe the code for the indexer would bemuch simpler
and efficient.
Also as the shift goes towards indexing views and not documents,it makes
sense that it is the View engine that triggers the indexer, right?
The only problem here is that views are changed, when they arebeing queried and not when documents are added. So you could end upwith a lot of not-indexed data because your view hasn't beenqueried. That can be worked around, but I don't think it makesthings any easier :)
The design of the update notification is intentionally simple. Weexpect the clients (the Indexer in this case) to be smart. Webelieve that this makes the server code is more robust in that way.
I have to study the View engine, if I am to provide any code forthis, though
(provided consensus blows in this direction).

Have fun
 Søren
On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
Hi Jan

It certainly would simplify configuration, allthough the
DbUpdateNotificationProcess setting ought to be retained as it is
potentially usefull for other stuff than indexing (can you havemore
than
one of these, setup?)
No, the update searcher will stay! :-)
I am also worried about responsetimes for searching, potentiallythe
indexing can take considerable time. With the current approach
indexing
can be done off peak hours and only searching is done at primetime.
Right, if you want to be conservative with resources, you mightwant
togo
with my approach at the expense of possibly higher response timesthe
first time things are searched for (as it is with views). I just
wanted to make
available my idea that fulltext indexing could be modelled afterhow
views
work, in case this is useful for a specific scenario.

Cheers
Jan
--
Have fun
Søren
--
Søren Hilmer, M.Sc., M.Crypt.
wideTrail            Phone: +45 25481225
Pilevænget 41        Email: [EMAIL PROTECTED]
DK-8961  Allingåbro  Web: www.widetrail.dk

On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
Heya,
while thinking more about the fulltext implementation, I began to
wonder why we don't model it after the view engine.
At the moment, we have an Indexer waiting for updatenotifications
and
polling CouchDB for changes and a separate mechanism toregister a
fulltext query Searcher, that looks up things in the index.
My proposed architectural change would be to trigger theIndexer fromthe Searcher module when a request comes in, just like viewswork.
This would delay the creation of fulltext indexes until they are
actually needed.

The possible drawback though is, that when building the fulltext
index
is rather slow, old-style pre-calculation might be more feasible.
View
deal with that by requiring frequent requests (possibly cron-ed).
This is not a proposal or anything, just a thought I wanted toshare
with those who work on fulltext integration.

If you have any input on this, please let us know ;)

Cheers
Jan
--
--
Søren Hilmer, M.Sc., M.Crypt.
wideTrail            Phone:    +45 25481225
Pilevænget 41        Email:    [EMAIL PROTECTED]
DK-8961  Allingåbro    Web:    www.widetrail.dk

Re: Lazy Fulltext Search

Reply via email to