[ ... ]Setup info & Stats: - 4.3M documents, 12 keyword fields per document, 11
"field1:4 AND field2:188453 AND field3:1"
field1:4 done alone selects around 4.2M records field2:188453 done alone selects around 1.6M records field3:1 done alone selects around 1K records The whole query normally selects less than 50 records Only the first 10 are returned (or whatever range the client selects).
The "field1:4" clause is probably dominating the cost of query execution. Clauses which match large portions of the collection are slow to evaluate. If there are not too many different such clauses then you can optimize this by re-using a Filter in place of such clauses, typically a QueryFilter.
For example, Nutch automatically translates such clauses into QueryFilters. See:
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/searcher/LuceneQueryOptimizer.java?view=markup
Note that this only converts clauses whose boost is zero. Since filters do not affect ranking we can only safely convert clauses which do not contribute to the score, i.e, those whose boost is zero. Scores might still be different in the filtered results because of Similarity.coord(). But, in Nutch, Similarity.coord() is overidden to always return 1.0, so that the replacement of clauses with filters does not alter the final scores at all.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]