I've used Lucene for a long time, but only in the most basic way. I have a custom analyzer and a slightly hacked query parser, but in general it's the basic add document/remove document/query documents cycle.

In my system, I'm indexing a store of external documents, maintaining an index for full-text querying. However, I might be turned off when documents are added, and then when I'm restarted, I'm going to need to determine the timestamp of the last document added to the index so that I can pick up where I left off.

There are three approaches to doing this, two using Lucene. I don't know how I would do the two Lucene approaches, or even if they're possible.

1. Just keep a file in parallel with the index, reading and writing the timestamp of the last indexed document in it. I know how to do this, but I don't like the idea of keeping a separate file.

2. Drop a timestamp onto each document as it's indexed. I've attached timestamp fields to documents in the past so that I could do range queries on them. However, I don't know how to do a query like "the document with the latest timestamp" or even if that's possible.

3. Create a dummy document (with some unique field identifier so you could quickly query for it) with a field "last timestamp". This is a "global value storage" approach, as you could just store any field with any value on it. But I'd be updating this timestamp field a lot, which means that every time I updated the index I'd have to remove this special document and reindex it. Is there any way to update the value of a field in a document directly in the index without removing and adding it again to the index? The field I'd want to update would just be stored, not indexed or tokenized.

Thanks for your help in guiding my exploration into the capabilities of Lucene.

Avi

--
Avi 'rlwimi' Drissman
[EMAIL PROTECTED]
Argh! This darn mail server is trunca


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to