Hi Yonik,
Your patch has corrected the thread thrashing problem on multi-cpu systems.
I've tested it with both 1.4.3 and 1.9. I haven't seen 100X performance
gain, but that's because I'm caching QueryFilters and Lucene is caching the
sort fields.
Thanks for the fast response!
btw, I had
Here is one stack trace:
Full thread dump Java HotSpot(TM) Client VM (1.5.0_03-b07 mixed mode):
Thread-6 prio=5 tid=0x6cf7a7f0 nid=0x59e50 waiting for monitor entry
[0x6d2cf000..0x6d2cfd6c]
at org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:241)
- waiting to lock 0x04e40278 (a
Thanks for the trace Peter, and great catch!
It certainly does look like avoiding the construction of the docMap for a
MultiTermEnum will be a significant optimization.
-Yonik
Now hiring -- http://tinyurl.com/7m67g
On 10/12/05, Peter Keegan [EMAIL PROTECTED] wrote:
Here is one stack trace:
Here's the patch:
http://issues.apache.org/jira/browse/LUCENE-454
It resulted in quite a performance boost indeed!
On 10/12/05, Yonik Seeley [EMAIL PROTECTED] wrote:
Thanks for the trace Peter, and great catch!
It certainly does look like avoiding the construction of the docMap for a
On a multi-cpu system, this loop to build the docMap array can cause severe
thread thrashing because of the synchronized method 'isDeleted'. I have
observed this on an index with over 1 million documents (which contains a
few thousand deleted docs) when multiple threads perform a search with
I'm not sure that looks like a safe patch.
Synchronization does more than help prevent races... it also introduces
memory barriers.
Removing synchronization to objects that can change is very tricky business
(witness the double-checked locking antipattern).
-Yonik
Now hiring --
I noticed the following code that builds the docMap array in
SegmentMergeInfo.java for the case where some documents might be deleted from
an index:
// build array which maps document numbers around deletions
if (reader.hasDeletions()) {
int maxDoc = reader.maxDoc();
Lokesh Bajaj wrote:
For a very large index where we might want to delete/replace some documents,
this would require a lot of memory (for 100 million documents, this would need
381 MB of memory). Is there any reason why this was implemented this way?
In practice this has not been an issue. A