That's what I thought. So, that leads me to . . . is it necessarily all that much faster to index in an incremental update fashion, rather than just clobbering the old index?
On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson <[EMAIL PROTECTED]>wrote: > You have to have indexed something that uniquely identifies the > document in order to know what the old one is. Really, this is > the same question as updating, isn't it? If you could update > a document in place, you'd have to know what document > that was. If you know that information, you know which > document to delete. > > Note that lucene has no built-in document recognition. If I > add the same document to the index twice, Lucene will > happily consider them two *separate* documents. You have > to code your own notion of document meta-id (as distinct > from the Lucene doc id). It could be the URL, the file path > on disk, a document ID from your organization... the > possibilities are endless. Which is why Lucene can't do that > for you. > > Best > Erick > > On Mon, Nov 10, 2008 at 2:22 PM, ChadDavis <[EMAIL PROTECTED] > >wrote: > > > In the FAQ's it says that you have to do a manual incremental update: > > > > How do I update a document or a set of documents that are already > indexed? > > > > > > There is no direct update procedure in Lucene. To update an index > > > incrementally you must first *delete* the documents that were updated, > > and > > > *then re-add* them to the index. > > > > > > > How do I determine the existing document that matches the document I am > > updating? > > >