> From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] > > >It seems that either a) deletes should be write-through, or > >b) deletes should > >be done by the writer, or c) writer should not optimize > >non-RAM segments unless > >asked to. As a client, I like option b) the best, though, > >this is not the easiest option to implement. My $0.02 > > Or maybe > d) when merging, a writer should share an in-memory image of segment1 > and prohibit any deletes on segment one while merge is in progress?
Or maybe: e) Deleting from a reader while an IndexWriter is open on the same index should throw an exception. This just requires the delete code to obtain the write.lock. Deletions and additions must happen serially. In particular, the intended order of operations is: reader.open(); reader.deleteDocument(...); reader.close(); writer.open(); writer.addDocument(...); writer.close(); The bug is that this is not enforced, nor is it well documented. Let's fix that first. Another bug might be that IndexWriter is a misnomer: it should really be called something like DocumentAdder. > Personally, I would also like to see deletion moved into the writer. And I'd like to see cars outlawed. Yes, this would be a cleaner API, but it would also encourage folks to write less efficient index updating code. The most efficient approach is to batch deletions and additions separately. Intermingling them will never be as fast. The current API encourages one to do things this way. Also, currently the deletion code is very simple and easy to maintain. Optimizing intermingled additions and deletions would require adding a lot of new code, substantially complicating Lucene, and likely introducing bugs. Some background: To delete a document we need an IndexReader to find its document number. To add a document we just need to add a new segment, opening no readers. Periodically a subset of the segments are opened by a reader to merge them. If deletion were added to an IndexWriter it would need to have an IndexReader opened on all segments, in order to find the document number and mark it as deleted. Each time a document is added or segments are merged this reader must be invalidated. It would be very inefficient to re-open this IndexReader each time a document is deleted, so code would need to be added to incrementally update a SegmentsReader in light of document additions and merges. Such a reader could also be optimized to only open those files that are required for deletion. Still, intermingling inserts and deletes would be less efficient, since it would require the dictionaries for each altered segment to be re-read in order to find the document number. So it could be done. But should it be? Doug -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
