First, I decided I wasn't comfortable doing closes on the IndexReader. So, I
did what I hope is better. I create a singleton SearcherManager
(out-of-the-box from the 4.1 release) and do acquire/releases. I assume that's
more or less equivalent anyway.
Second, it doesn't really matter as I am still seeing the same slow searches.
I'm becoming convinced that the problem is in the indexer (see below for why).
So, briefly, there are two parts to my use of lucene (all running on Windows).
The first part is a windows service that does the indexing. It reads a
directory which has new items to be indexed. The indexing it does is totally
serialized (meaning there are not multiple threads). It completely indexes one
document before it moves onto the next. Even at that, I'm averaging about 14
ms per document on a fairly old machine. Each document is an xml file and
averages about 4k bytes.
The searching happens in a tomcat web server. Obviously, there may be multiple
simultaneous searches.
Here's what I did today. I did a full reindex (all the documents are in
directories which I can walk on the local hard drive). There were roughly 600k
documents. The reindex is a separate program which simply does the reindex and
quits. It opens the index, indexes all of the files (no commits), does a
forceMerge, and then closes the writer (which I assume forces a commit).
Neither the web server nor the index service were running while the reindex was
going on (i.e., I don't think there was anything touching the index other than
the reindex program itself). The last thing the indexer does before closing
the index is do a forceMerge(2). Here's what the index directory looked like
after the reindex completed (the value in parentheses is the total bytes for
those files).
61 CFE (17.7KB)
61 CFS (2.09GB****)
61 si (16.9KB)
42 DEL (23.1KB
10 FDT (32.2 MB
10 FDX(12.8KB)
10 FRM 11.1KB
10 pos (157MB)
10 tim ( 28.7MB)
10 tip (582KB)
10 tvd (254kb)
10 tvf (232 MB)
10 tvx (2MB)
10 doc (62.5MB)
1 segment_1 (2KB)
1 segments.gen (1KB)
So, 377 files for a total 2.6GB and most of it in the CFS files.
I then restarted the windows service. Since then (about 2 hours), there are
now 82 CFS files. 51 of them range from 29.8 to 51.2 MB each (2.09GB total).
So, I'm pretty convinced the issue is in the indexing since I still haven't
done any searching yet.
The index writer is initialized as follows:
FSDirectory dir = FSDirectory.open(new File(indexDirectory));
IndexWriterConfig iwc = new
IndexWriterConfig(Constants.LUCENE_VERSION,
oAnalyzer);
LogByteSizeMergePolicy lbsm = new LogByteSizeMergePolicy();
lbsm.setMaxMergeDocs(10);
lbsm.setUseCompoundFile(true);
iwc.setMergePolicy(lbsm);
_oWriter = new IndexWriter(dir, iwc);
But I also notice that I added the following. The intent was to have the
writer flush the buffer when it had indexed enough documents to reach 50MB (an
arbitrary number I picked out of the air because it felt right :-) ). It seems
odd to me that the maximum size of the CFS files is also about 50 MB. So, I'm
wondering if this affects the writer's ability to merge files.
// don't flush based on number of documents
// flush based on buffer size
_oWriter.getConfig().setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH)
.setRAMBufferSizeMB(50.0);
Any help in figuring out what is causing this problem would be appreciated. I
do now have an offline system that I can play with so I can do some intrusive
things if need be.
Scott
-----Original Message-----
From: Scott Smith [mailto:[email protected]]
Sent: Saturday, March 16, 2013 1:28 PM
To: [email protected]
Subject: RE: Lucene slow performance
Thanks for the help.
The reindex was done this morning and searches now take less than a second.
I will make the change to the code.
Cheers
Scott
-----Original Message-----
From: Uwe Schindler [mailto:[email protected]]
Sent: Friday, March 15, 2013 11:17 PM
To: [email protected]
Subject: RE: Lucene slow performance
Please forceMerge only one time not every time (only to clean up your index)!
If you are doing a reindex already, just fix your close logic as discussed
before.
Scott Smith <[email protected]> schrieb:
>Unfortunately, this is a production system which I can't touch (though
>I was able to get a full reindex scheduled for tomorrow morning).
>
>Are you suggesting that I do:
>
>writer.forceMerge(1);
>writer.close();
>
>instead of just doing the close()?
>
>-----Original Message-----
>From: Simon Willnauer [mailto:[email protected]]
>Sent: Friday, March 15, 2013 5:08 PM
>To: [email protected]
>Subject: Re: Lucene slow performance
>
>On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith
><[email protected]> wrote:
>> " Do you always close IndexWriter after adding few documents and when
>closing, disable "wait for merge"? In that case, all merges are
>interrupted and the merge policy never has a chance to merge at all
>(because you are opening and closing IndexWriter all the time with
>cancelling all merges)?"
>>
>> Frankly I don't quite understand what this means. When I "close" the
>indexwriter, I simply call close(). Is that the wrong thing?
>that should be fine...
>
>this sounds very odd though, do you see file that get actually removed
>/ merged if you call IndexWriter#forceMerge(1)
>
>simon
>>
>> Thanks
>>
>> Scott
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:[email protected]]
>> Sent: Friday, March 15, 2013 4:49 PM
>> To: [email protected]
>> Subject: RE: Lucene slow performance
>>
>> Hi,
>>
>> with standard configuartion, this cannot happen. What merge policy do
>you use? This looks to me like a misconfigured merge policy or using
>the NoMergePolicy. With 3,000 segments, it will be slow, the question
>is, why do you get those?
>>
>> Another thing could be: Do you always close IndexWriter after adding
>few documents and when closing, disable "wait for merge"? In that case,
>all merges are interrupted and the merge policy never has a chance to
>merge at all (because you are opening and closing IndexWriter all the
>time with cancelling all merges)?
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [email protected]
>>
>>> -----Original Message-----
>>> From: Scott Smith [mailto:[email protected]]
>>> Sent: Friday, March 15, 2013 11:15 PM
>>> To: [email protected]
>>> Subject: Lucene slow performance
>>>
>>> We have a system that is using lucene and the searches are very
>slow.
>>> The number of documents is fairly small (less than 30,000) and each
>>> document is typically only 2 to 10 kilo-characters. Yet, searches
>are taking 15-16 seconds.
>>>
>>> One of the things I noticed was that the index directory has several
>
>>> thousand
>>> (3000+) .cfs files. We do optimize the index once per day. This is
>
>>> a system that probably gets several thousand document deletes and
>>> additions per day (spread out across the day).
>>>
>>> Any thoughts. We didn't really notice this until we went to 4.x.
>>>
>>> Scott
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
B KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB [
X ܚX KK[XZ[
] K]\ \ ][ X ܚX PX [ K \X K ܙ B ܈Y][ۘ[ [X[ K[XZ[
] K]\ \ Z[X [ K \X K ܙ B B