If you're interested, we use the following pattern to do incremental updates between a database and a Lucene index.
1) Add a field to the database table you wish to index called "DateUpdate". Update this date whenever a field in the table is changed. 2) Create a new database table to store the ID of any item that is deleted from the table above. I'll referrer to this as the "Deleted" table. 3) Have an indexer application that runs every X minutes, and does the following: a) Load all the items from the "Deleted" table and remove them from the Lucene index b) Load all the items from the main table with a "DateUpdate" date greater than the last time the indexer application ran. Delete these items from the Lucene index, and then reinsert them with the newer data. c) Purge all the items from the "Deleted" table. d) Save the date of the last "DateUpdated" item you processed, and use this date to load items the next time the indexer application runs. This is an oversimplification, since you need to consider failover etc, and there may be other factors that dictate your search indexing rules. But it gives you a general idea. I'd be curious to hear/discuss other solutions to this. Monsur On 10/19/06, Scott <[EMAIL PROTECTED]> wrote:
I have tried Senna that is an embeddable fulltext search engine. It is embbeded into MySQL. http://qwik.jp/senna/ I inserted 1,000,000 of documents by using INSERT INTO sql, and I can search documents by using SELECT * FROM table WHERE MATCH(field_name) AGAINST('search-words'). It is based on SQL, so easy to use and support incremental update. I don't do benchmark test yet but it's not slow. I think that the Lucene need to support incremental update futurely. -- Scott