> I am not using RAMDirectory due to the large size of > index file. the index generated on hard disc is 1.57G > for 1 million documents, each document has average 500 > terms. I am using Field.UnStored(fieldName, terms), so > i beliece I am not storing the documents, just the > index. (is that right?) is there anyway to reduce the
Correct. > index size created? also What is the maximum size of > data can be stored in RAMDirectory? I suppose I could Depends on how much RAM you have. > get a 10G RAM solaris box, but would that be advisable > say storing 2-3G of index data in memory? Yes, why not. > Also, what > is the performance boost factor when RAMDirectory > comparing to FSDirectory. Are we taling about > 100% > here? I don't know if I can qualify it like that, but one of the Lucene articles (Resources section on Lucene's site) covers RAMDirectory and shows performance differences. > On your 2nd and 3rd suggestion, I probably run the > latest code that includes the fix by Dmitry > Serebrennikov, the build was checked out from CVS > yesterday. and I used a QueryParser similar to the one > used in the demo code. Dmitry's code was never included in CVS. I don't know/remember why. > Again, I still feel a bit curious and want to find out > does lucene do (or in the future) pre-filter on "AND > join conditions". For example, A AND (B OR C OR D). if > A finds 100 docs out of 1 million, can lucene restrict > the searchs on B,C,D only within the 100 docs found? Ah, this was mentioned recently on the list. I don't remember what the conclusion was, but Erik did remind us of Lucene's filters which can help in some situations. Otis > >Response to: Poor Performance when searching for 500+ > >terms (Jie Yang) > > >From: Julien Nioche <[EMAIL PROTECTED]> > Subject: Poor Performance when searching for 500+ > terms > >Date: Thu, 13 Nov 2003 12:45:50 +0100 > >Content-Type: text/plain; charset="iso-8859-1" > > > >Hello, > > > >Since there are a lot of Term objects in your Query, > >your application must > >spend a lot of time collecting information about > >those Terms. > > > >1/ Do you use RAMDirectory? Loading the whole > >Directory into memory will > >increase speed - your index must not be too big > though > > > >2/ You are probably not using the QueryParser - so > >when you are building the > >Query you could sort the Term objects inside a > >BooleanQuery. Sorting the > >Terms will reduce jumps on disk. I have no benchmarks > > >for this, but > >logically, it should have some positive effect when > >using FSDirectory. Am I wrong? > > >3/ There was a patch submitted by Dmitry > Serebrennikov > >(http://www.mail-archive.com/[EMAIL PROTECTED]/msg02762.html) > >which reduced garbage collecting by limiting the > >creation of temporary Term objects. This patch has > >not been included in Lucene code (a bug in it?). > > > >Hope it helps. > > > >Julien > > > ________________________________________________________________________ > Want to chat instantly with your online friends? Get the FREE Yahoo! > Messenger http://mail.messenger.yahoo.co.uk > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
