Jochen, I'm afraid I didn't understand your post fully. Nevertheless, did you consider adding prefix terms (in a separate field) as normal terms to your index? Eg. suppose your terms are nrs ranging 0000 to 9999 you could search the range 0250-0302 by prefixes indexed as terms: 025 026 027 028 029 0300 0301 0302 instead of all 53 terms separately, probably saving quite a few disk head seeks for the range query.
How are the ranges and the spans related? Kind regards, Ype On Thursday 25 March 2004 18:47, Jochen Frey wrote: > Hi There! > > We are in the process of building a query optimizer for Lucene RangeQueries > (we need that because we run fairly complex Range queries with a few > hundred terms against large corpuses, and response time needs improvement). > We have written a framework that allows for traversing queries and > rearranging / recreating subqueries. > > In a next step, we tried to find criteria to optimize. A Simple one is to > reduce the total number of terms in the query. > > Question 1: Is it a good idea to minimize the # of terms. > > Some optimization options however leave the choice of which term to reduce. > In order to make that choice we are using a fairly simple cost estimator > for queries and terms (currently we only deal with SpanNearQuery, > SpanOrQuery and SpanTermQuery) > > SpanNearQuery: 10 - #of clauses + total of the cost of all clauses > SpanOrQuery: 10 + total of the cost of all clauses > SpanTermQuery: 1 over #of characters in the term > > Question 2: Does anyone have better cost estimates or comments about this? > > This optimization is all happening client side (i.e. as of the writing of > this, the optimizer does not know the statistics for tokens actually stored > in the index). > > Question 3: How do I get access to Term frequencies (i.e. the number of > times a given Term appears in the index). I assume that the way to go is > getTermFreqVectors in IndexWriter. This should allow for better choices as > to which term to eliminate. > > Question 4: What are good cost estimates assuming that we have term > frequencies available? > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]