Hi I recently read CHANGES to learn more about the readOnly parameter IndexReader now supports, and came across LUCENE-1329 with a comment that isDeleted was made not synchronized if readOnly=true (e.g. ReadOnlyIndexReader), which can affect search code, as it is usually the bottleneck for search operations.
I searched the code and was surprised to see isDeleted and hasDeletions are not called from any search code. Instead, the code, such as SegmentTermDocs, uses the deletedDocs instance directly. So in fact isDeleted wasn't the bottleneck (unless this was part of the changes in 1329). Anyway, doesn't matter, that's good ! However, I did find out some indexing code whic calls these two, like SegmentMerger when it merges fields and term vectors. I think that we can improve that code by writing some specialized code for merging - if the reader has no deletions, there's no point checking for each document if there are deletions and/or if the document was deleted. In fact, SegmentMerger checks for each document: (1) whether the reader has deletions and if the document was deleted, (2) if the reader has a matching reader and (3) if checkAbort is not null. I have a couple of suggestions to simplify that code: 1. Create two specialized copyFieldsWithDeletions/copyFieldsNoDeletions to get rid of the unnecessary if (hasDeletions) check for each doc. 2. In these, check if the reader has matching reader or not, and execute the appropriate code in each case. 3. Also, check up front if checkAbort != null. 3.1 (3) duplicates the code - so perhaps instead I can create a dummy checkAbort, which does nothing? That way we'll always call checkAbort.work(). But this adds a method call, so I'm not sure. (same optimizations for mergeVectors()). In addition, I think something can be done w/ SegmentTermDocs. Either create a specialized STD based on whether it has deletions or not, or create a dummy BitVector which returns false for every check. That way we can eliminate the checks in each next(), skipTo(). Dummy BitVector will leave the file as-is, but will add a method call, so I think I lean towards the specialized STD. This can be as simple as making STD abstract, with a static create() method which creates the right instance. I believe there are other places where we can make such optimizations. If the overall direction seems right to you, I can open an issue and start to work on a patch. Currently I've patched SegmentMerger, and besides the class getting bigger, nothing bad happened (meaning all tests pass), though it might be interesting to check how it affects performance. What do you think? Shai