> From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] > > If there is one "user" performing additions and deletions, > then the two > can be ordered. But if an application is such that it allows multiple > people initiate index updates of various kinds, it may be > much harder to > order additions and deletions.
Only one "user" is currently permitted to perform additions at once. This is enforced by the "write.lock" file. It would be easy to extend this restriction so that only a single user is permitted to perform additions or deletions at once. Lucene does not support simultaneous index modification by multiple processes. This restriction is just not yet properly enforced by the deletion code. > I agree that the DocumentAdder would be a clearer name for the > IndexWriter. Also, +1 on documenting the preferred operation > order and > enforcing it if possible. Cool. I think this is the approach that best keeps Lucene "lean and mean". > However, there are applications where this becomes very > awkward. I think > the main need for doing delete + add in one operation is when > replacing > documents with more up-to-date copies. How awkward is it to open a reader, delete a document, close it, open a writer, add a document, and then close the writer? If that's really too much work, we could add a utility method to enacapsulate it. However, if you're updating more than a single document, its much more efficient to first do all the deletions, then do all the additions. So adding that utility method might then encourage folks to write inefficient code. Perhaps the utility method to add is something like: void updateDocs(Document[] docs, String idField); This would delete any documents currently in an index that have the same value for 'idField' as a document in 'docs', then add all the documents in docs. This API would encourage batching. Its implementation would be to open a reader, do the deletions, close the reader, open a writer, do the additions, then close the writer. > Document ids are, of course, segment-specific and change > during merge. > This makes searches fast, but it makes it impossible to identify a > document. But what if we add a "special" field, or add a > unique document > id in some other way? The searches will still use the > segment-specific > ids and remain fast, but there would be a unique id assigned to each > document that applications could use if needed and also the replace > operation could use in the IndexWriter. Obviously, we would > have to make > sure that these ids can be created quickly by multiple > writers without a > possibility of duplicate ids. > > Would this work? Sure, it *could* work. But we'd need to add a new special dictionary for document ids that is written to disk. This would be smaller and hence faster to access than the term dictionary that is now used for document ids. All of the indexing code (creating, merging, reading) would have to be modified to support this id dictionary. And still, batched deletions would be faster than intermingled insertion/deletion, just not as much. Is it worth it? The current use of document fields for unique ids builds on existing code, which is nice. Doug -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
