Performance in TermInfosReader

Dmitry Serebrennikov Sat, 27 Oct 2001 13:28:20 -0700

Greetings,

I'm doing some stress testing and optimization for out application for 
high concurrency rates and I'm seeing a lot of contention over the 
synchronization monitor in TermInfosReader.terms(Term). Our application 
tends to do a lot of navigation through the term dictionary to resolve 
each user's request. This probably isn't a typical situation for Lucene, 
but has anyone seen this?


I'm using OptimizeIt profiler (it's GREAT!) and this is how I know. 
Tomorrow I'm going to look at what can be done either on the application 
side or in Lucene to ease this contention. Does anyone have any ideas / 
suggestions / experience in this area?

More specific info:
The actual operation that the application is performing involves 
searching for a large number of terms (100-200) in the dictionary, which 
may or may not be there. These terms are sorted by term number (and thus 
lexicographically too). At first, I tried to have a single TermEnum and 
scroll through it. This turned out to be very slow. Creating new enum 
using terms(Term) seems to work better. There were many other 
bottlenecks all over the place that I had to clear out and now I'm back 
at this same issue.

Doug, what would be an approach for making TermEnums "seekable" in an 
efficient manner?

On the term vector support:
I made some substantial changes in order to improve performance. The 
interface is now different. It is more like an enum, so that you seek to 
a particular document and then access its term vector. Then you move to 
another one. This significantly cuts down on needless memory allocation 
since no TermVector objects need to be created. If anyone had a chance 
to take a look at the code I released previously, feedback would be 
welcome! :)

Dmitry

Performance in TermInfosReader

Reply via email to