First of all, let me apologize for the double post but I got some strange error message =\
>The first question is what do you mean the document >is already in the index? Lucene doc IDs are useless >here since the ones in your FSDir and the ones in your >RAMdir are unrelated. In fact, I suspect that the >lucene docIDs will start at the same number in both. The documents have 2 fields, one of them being their identifier, which is Indexed and Not Tokenized (and stored). >Lucene doc IDs are just monotonically incremented integers. >So, how do you identify identical documents? Is there some >field in your document that's guaranteed to be unique to each >document? If so, *that's* the field you can use for termEnu >to get the Lucene docid to remove, assuming you've indexed >it UN_TOKENIZED or you are very, very, very confident that >your tokenizers won't break it up. >But you can make this easier by using IndexReader.deleteDocument(term) >where the term is your unique field. Since I have an unique field for each document, I'll use that then. However, what's the thinking behind the enum/delete? The termEnum lists me the terms which have a particular field value and gives me their "lucene ID" so that the Reader can then delete it? >Additionally, I question why you bother with a RAMdir for your changes. >An index reader essentially takes a snapshot of your index, and >subsequent changes are not seen by your searchers until you >close and reopen the underlying reader. What advantage do you >see in using a RAMdir? I use a RAMdir because it is somehow faster. I'm downloading a few thousand documents at a time from the internet and then indexing them in a RAM index and then merging them to the FS dir. Also, in case anything fails, my FS directory, and thus my "stable" index, is safe from harm :) But why did you ask? Advice is welcome :) Thanks, João Rodrigues