This is how I implemented incremental indexing. If anyone sees anything
wrong, please let me know.

Our motivation is similar to John Eichel's. We have a digital asset
management system and when users update, delete or create a new asset,
they need to see their results immediately.

The most important thing to know about incremental indexing that
multiple threads cannot share the same IndexWriter, and only one
IndexWriter can be open on an index at a time.

Therefore, what I did was control access to the IndexWriter through a
singleton wrapper class that synchronizes access to the IndexWriter and
IndexReader (for deletes). After finishing writing to the index, you
must close the IndexWriter to flush the changes to the index.

If you do this you will be fine.

However, opening and closing the index takes time so we had to look for
some ways to speed up the indexing.

The most obvious thing is that you should do as much work as possible
outside of the synchronized block. For example, in my application, the
creation of Lucene Document objects is not synchronized. Only the part
of the code that is between your IndexWriter.open() and
IndexWriter.close() needs to be synchronized.

The other easy thing I did to improve performance was batch changes in a
transaction together for indexing. If a user changes 50 assets, that
will all be indexed using one Lucene IndexWriter.

So far, we haven't had to explore further performance enhancements, but
if we do the next thing I will do is create a thread that gathers assets
that need to be indexed and performs a batch job every five minutes or
so.

Hope this is helpful,
Luke


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to