Re: [Chandler-dev] Chandler background full-text indexing

Heikki Toivonen Tue, 23 May 2006 20:30:02 -0700

Andi Vajda wrote:
> The background indexer runs every minute or so. This value is hardcoded.
> At some point we need to have support for user preferences and we can
> then tie that value in with them. If you're in a real hurry to have your
> stuff indexed in the background right away, you can use the
> repository.notifyIndexer() API.


While once a minute is good proof-of-concept, I believe we need to have
a centralized way to force indexing to happen before using functionality
that requires up-to-date indexes.

For example, suppose a user synchronizes their collections, and follows
up with a search. If the indexer hasn't run yet, the search will not
find the newly synced items (and will return garbage for changed stuff).

It is not scalable/reliable to add spot checks to the code to force
indexing just before actions that we know will need to have fresh
indexes (like run indexer before executing search).

I am not sure where the choke point should be - in the repository itself
or some layer above it.

> PyLucene indexing is also considerably faster now. I realized that the

Looking at Tinderbox perf data, the new code more than halved the time
it takes to import a large calendar. All in all, our new code is about
5% faster than it was before we started indexing stuff (on Windows,
didn't check other platforms).

We will need new tests and may need to modify existing tests to work
with indexing in a deterministic way, measure actual indexing perf etc.

-- 
  Heikki Toivonen

signature.asc
Description: OpenPGP digital signature

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Re: [Chandler-dev] Chandler background full-text indexing

Reply via email to