On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer <simon.willna...@gmail.com> wrote: > Sean seriously a couple of hundred docs a second, don't bother just > use updateDocument. My benchmarks show that there is only a smallish > impact during indexing especially with concurrent flushing in lucene > 4. I don't know how resource intensive your analysis chain is but on a > decent machine you can easily go > 20k docs a second with > updateDocument. > > If you want to give deleteByDocid a try for kicks I'd be curious how > you solve some of the really tricky issues! :)
This (add delete-by-docID to IndexWriter) has been fairly frequently requested... But the problem is docIDs can suddenly change up whenever a merge commits, so I don't see how we can add it in general. That said, there is an initial patch here: https://issues.apache.org/jira/browse/LUCENE-4203 It adds IW.tryDeleteDocument(AtomicReader reader, int docID), with the requirement that the reader is a near-real-time reader obtained from the writer. The delete will succeed (return true) if that reader has not yet been merged away, else it fails (returns false) and you have to do the delete the "normal" way (by Term). I won't have much time to get back to that issue in the near future so feel free to take it! Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org