Greetings, I'm doing some stress testing and optimization for out application for high concurrency rates and I'm seeing a lot of contention over the synchronization monitor in TermInfosReader.terms(Term). Our application tends to do a lot of navigation through the term dictionary to resolve each user's request. This probably isn't a typical situation for Lucene, but has anyone seen this?
I'm using OptimizeIt profiler (it's GREAT!) and this is how I know. Tomorrow I'm going to look at what can be done either on the application side or in Lucene to ease this contention. Does anyone have any ideas / suggestions / experience in this area? More specific info: The actual operation that the application is performing involves searching for a large number of terms (100-200) in the dictionary, which may or may not be there. These terms are sorted by term number (and thus lexicographically too). At first, I tried to have a single TermEnum and scroll through it. This turned out to be very slow. Creating new enum using terms(Term) seems to work better. There were many other bottlenecks all over the place that I had to clear out and now I'm back at this same issue. Doug, what would be an approach for making TermEnums "seekable" in an efficient manner? On the term vector support: I made some substantial changes in order to improve performance. The interface is now different. It is more like an enum, so that you seek to a particular document and then access its term vector. Then you move to another one. This significantly cuts down on needless memory allocation since no TermVector objects need to be created. If anyone had a chance to take a look at the code I released previously, feedback would be welcome! :) Dmitry
