> I recently read CHANGES to learn more about the readOnly parameter > IndexReader now supports, and came across LUCENE-1329 with a comment that > isDeleted was made not synchronized if readOnly=true (e.g. > ReadOnlyIndexReader), which can affect search code, as it is usually the > bottleneck for search operations. > > I searched the code and was surprised to see isDeleted and hasDeletions are > not called from any search code. Instead, the code, such as SegmentTermDocs, > uses the deletedDocs instance directly. So in fact isDeleted wasn't the > bottleneck (unless this was part of the changes in 1329). Anyway, doesn't > matter, that's good !
You're right! (I had also thought it was more widely used). I guess apps that iterate through many docs (calling document()) might hit this? Also, it's only in 2.9 that MatchAllDocsQuery doesn't use isDeleted. So I agree the benefit of using readOnly reader is presumably not much, as of 2.9... Actually: I think we should also change IndexReader.document to not check if it's deleted? (Renaming it to something like rawDocument(), storedDocument(), something, in the process, and deprecating the old one). > However, I did find out some indexing code whic calls these two, like > SegmentMerger when it merges fields and term vectors. I think that we can > improve that code by writing some specialized code for merging - if the > reader has no deletions, there's no point checking for each document if > there are deletions and/or if the document was deleted. In fact, > SegmentMerger checks for each document: (1) whether the reader has deletions > and if the document was deleted, (2) if the reader has a matching reader and > (3) if checkAbort is not null. > > I have a couple of suggestions to simplify that code: > 1. Create two specialized copyFieldsWithDeletions/copyFieldsNoDeletions to > get rid of the unnecessary if (hasDeletions) check for each doc. > > 2. In these, check if the reader has matching reader or not, and execute the > appropriate code in each case. I think this make sense. > 3. Also, check up front if checkAbort != null. Be careful: checkAbort needs to be called "fairly" frequently so IndexWriter.close(false) doesn't take too long. > 3.1 (3) duplicates the code - so perhaps instead I can create a dummy > checkAbort, which does nothing? That way we'll always call > checkAbort.work(). But this adds a method call, so I'm not sure. I think a dummy impl is fine; it's only null during unit tests and in IndexWriter.addIndexes(IndexReader[]). > In addition, I think something can be done w/ SegmentTermDocs. Either create > a specialized STD based on whether it has deletions or not, or create a > dummy BitVector which returns false for every check. That way we can > eliminate the checks in each next(), skipTo(). Dummy BitVector will leave > the file as-is, but will add a method call, so I think I lean towards the > specialized STD. This can be as simple as making STD abstract, with a static > create() method which creates the right instance. > > I believe there are other places where we can make such optimizations. If > the overall direction seems right to you, I can open an issue and start to > work on a patch. Currently I've patched SegmentMerger, and besides the class > getting bigger, nothing bad happened (meaning all tests pass), though it > might be interesting to check how it affects performance. I think this makes sense. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org