Hi all,
I have about 10 million small records (less than 1kb each) that I want
to index with Lucy (through the Perl frontend). The primary data store
is a relational database.
So I create my search index, wait a day, and then want to index all the
new records/documents. For finding out which records are new, I have to
know which ones are already indexed. For 10 mio records (and only a few
thousand new each day) it's not efficient to check each one, so I'd like
to store some thing like a "last indexed ID" or "last indexed timestamp"
or so along with the search index.
Is there any way to store such meta data along with the search index?
(I know I could store it inside the RDBMS, but that doesn't feel right
from an architectural point of view; the RDBMS shouldn't care about the
existence of the search index at all; nor do I want to lose information
about the search index when overwrite the contents of my RDBMS' database
with a backup).
How do other people solve that problem?
Cheers,
Moritz