A similar experiment with 500 shorter queries shows a 20% speed improvement. (see xls file for details) By shorter query I mean something like that : ((titre:"burgundy wines"~3 titre:"burgundy wine"~3)) ((texte:"burgundy wines"~3^3.0 texte:"burgundy wine"~3^3.0)) ((descr:"burgundy wines"~3^4.0 descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy wine"~3^4.0))
----- Original Message ----- From: "Julien Nioche" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, July 01, 2004 10:53 AM Subject: Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL > I got a little bit deeper in my experiments with INDEX_INTERVAL. In a > previous mail to the user list I reported a 10% improvement over the regular > setting (128) with one of my application. > I refined the measures by taking the time spent not in the whole > application, but in a method that encapsulates Lucene searches. Only the > search time is measured, not the access to the Documents. > > Two sets of queries are generated using a log of user queries from our > application. Theses queries are in natural language and are expanded by our > product into a Lucene boolean query. Attached is the boolean generated for > the query "Burgundy wine" - just to give you an idea of what I mean by large > query (this one is particularly big). > > These queries are used on an optimized index (INDEX_INTERVAL=16) and a > regular index. The index used for this test is 720 MB - FSDirectory on > Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb in > the original. Both sets of queries have the same size (783). The xls file > contains the times for both indexes sorted by decreasing order. Actually the > numbers indicates not a single search but a group of up to 4 searches. > > In average, changing the indexinterval to 16 yields an improvement of about > 40% compared to the regular setting. > I will try with a bigger sample of 40.000 queries and with smaller queries > as well. > > The original motivation for this feature can be found at > http://www.mail-archive.com/[EMAIL PROTECTED]/msg04092.html > > What is the best way to set up this value in IndexWriter? Maybe we could > limit to a few possible values like : > DEFAULT = 128 > AVERAGE = 64 > HIGH = 32 > in order to avoid too low settings. > > Any comments or suggestions? Can anyone give feedback on this? > > Julien > > > > ----- Original Message ----- > From: "Julien Nioche" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Tuesday, June 29, 2004 3:03 PM > Subject: Re: Optimizing for long queries? > > > > I ran some tests changing TermInfosWriter.INDEX_INTERVAL to 16. > > On my application (which does a lot on top of lucene - including SQL > > transactions and so on) I won 10% percent time. > > I suppose this could be a bigger improvements in other applications, > because > > the search with Lucene is not 100% of my application. > > > > The index used for this test is 720 MB - FSDirectory on Fedora 1 > > the .tii file is 3398 Kb in the modified version against 488Kb in the > > original (INDEX_INTERVAL=128) > > > > Has anyone tried changing this value? Do you get similar results? > > > > Julien > > > > ----- Original Message ----- > > From: "Julien Nioche" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Monday, June 28, 2004 10:04 AM > > Subject: Re: Optimizing for long queries? > > > > > > > Hello Drew, > > > > > > I don't think it's in the FAQ. > > > > > > 1 - What you could do is to sort your query terms by ascending > alphabetic > > > order. In my case it improved a little bit the performance. It could be > > > interesting to know how it worked in your case. > > > > > > 2- Another solution is to play with TermInfosWriter.INDEX_INTERVAL at > > > indexation time. I quote Doug : > > > > > > "..., try reducing TermInfosWriter.INDEX_INTERVAL. You'll > > > have to re-create your indexes each time you change this constant. You > > > might try a value like 16. This would keep the number of terms in > > > memory from being too huge (1 of 16 terms), but would reduce the average > > > number scanned from 64 to 8, which would be substantial. Tell me how > > > this works. If it makes a big difference, then perhaps we should make > > > this parameter more easily changeable." > > > > > > Have you used a profiler on your application? This could be useful to > spot > > > possible improvments. > > > > > > > > > ----- Original Message ----- > > > From: "Drew Farris" <[EMAIL PROTECTED]> > > > To: <[EMAIL PROTECTED]> > > > Sent: Friday, June 25, 2004 8:24 PM > > > Subject: Optimizing for long queries? > > > > > > > > > > Apologies if this is a FAQ, but I didn't have much luck searching the > > > > list archives for answers on this subject: > > > > > > > > I'm using Lucene in a context where we have frequently have queries > > > > that search for as many as 30-50 terms in a single field. Does anyone > > > > have any thoughts concerning ways optimize Lucene for queries of these > > > > lengths? > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > ---------------------------------------------------------------------------- ---- > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
