Hi all,

I have about 10 million small records (less than 1kb each) that I want to index with Lucy (through the Perl frontend). The primary data store is a relational database.

So I create my search index, wait a day, and then want to index all the new records/documents. For finding out which records are new, I have to know which ones are already indexed. For 10 mio records (and only a few thousand new each day) it's not efficient to check each one, so I'd like to store some thing like a "last indexed ID" or "last indexed timestamp" or so along with the search index.

Is there any way to store such meta data along with the search index?

(I know I could store it inside the RDBMS, but that doesn't feel right from an architectural point of view; the RDBMS shouldn't care about the existence of the search index at all; nor do I want to lose information about the search index when overwrite the contents of my RDBMS' database with a backup).

How do other people solve that problem?

Cheers,
Moritz

Reply via email to