We recently upgraded from lucene 2.4.0 to lucene 3.0.2. Our load testing
revealed a serious performance drop specific to traversing the list of terms
and their associated documents for a given indexed field. Our code looks
something like this:
for(Term term : terms) {
TermDocs termDocs = indexReader.termDocs(term);
while(termDocs.next()) { // much slower here
int doc = termDocs.doc();
...do something with each doc...
}
The slowness is all on the first call to TermDocs.next() for each term.
Further investigation comparing 2.4.0 and 3.0.2 revealed that there is some new
synchronization on the SegmentTermDocs constructor and the
SegmentReader.getTermsReader(). The first call to next() hits this
synchronization, causing a 4x slowdown on an 8 CPU machine.
My first question is should we be using a different approach to process each
term's doc list that would be more efficient? The synchronization appears to
be on aspects of these classes that the next() operation is not concerned with.
My other question is whether there are planned performance enhancements to
address this loss of performance?
Thanks.
John