Doug Cutting wrote:
Then you need to ensure that you leave the index has no deletions, and optimize it if it has any, to remove them. This is probably most safely done as the first step, rather than the last.
Good point. I didn't think about this.
I'm not sure this method has many advantages over what Christoph orginally suggested in:
http://www.mail-archive.com/lucene-dev%40jakarta.apache.org/msg06165.html
Yes, I agree that it's not too different. The main benefit I see, and I think this may be significant for some applications, is that in Christoph's original method new documents must be iterated over twice - in his steps 2 and 4. This may be a problem for some applications because it requires buffering newly arrived documents somewhere - something that Lucene will not directly help with. That means people may have to write substantial external code to support this usage (or perhaps use a database, file system, etc).
With the modification I'm proposing, the documents can be added to the index as they arrive. No buffering is required and documents are handled exactly once. The "buffering" occurs instead on document ids to be deleted, which is much easier to do and one can even use the BitSet class (or Filter) supplied with Lucene.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]