The issue that I have is well exemplified by section 3.4.5 "Combining queries: BooleanQuery" in LIA, 2nd ed. The example uses BooleanQuery to combine - a TermQuery, for matching document topic, for which the TF-IDF scoring makes sense; and - a NumericRangeQuery, whose purpose is to filter by publication date.
I extended the example code to output the query and the explanation: Title AND Date = +subject:search +pubmonth:[201001 TO 201012] ---------- Lucene in Action, Second Edition 1.6848878 = (MATCH) sum of: 1.3560408 = (MATCH) weight(subject:search in 9), product of: 0.9443832 = queryWeight(subject:search), product of: 2.871802 = idf(docFreq=1, maxDocs=13) 0.3288469 = queryNorm 1.435901 = (MATCH) fieldWeight(subject:search in 9), product of: 1.0 = tf(termFreq(subject:search)=1) 2.871802 = idf(docFreq=1, maxDocs=13) 0.5 = fieldNorm(field=subject, doc=9) 0.3288469 = (MATCH) ConstantScoreQuery(pubmonth:[201001 TO 201012]), product of: 1.0 = boost 0.3288469 = queryNorm Computing a queryNorm for the NumericRangeQuery has no meaning. Instead of simply filtering by date, this component contributes a substantial amount (0.3288) to the overall score (especially if the title match has a low score). In my own (inherited) application I have multiple textual queries, matching against different fields, combined with several NumericRangeQueries. The contributions of the latter to the scores makes it hard to control boosts of different fields. The logical course of action seems to me to replace the NumericRangeQueries with filters. This means removing the NumericRangeQueries from the overall BooleanQuery and separately build a filter that combines corresponding NumericRangeFilters. Several options that I have are: - Use BooleanFilter - Use ChainFilter - In order to change as little code as possible, keep the code that combines all NumericRangeQueries into a BooleanQuery, and wrap that in a QueryWrapperFilter. Q1: Are there any (performance ?) advantages or disadvantages for each of these options ? Q2: Are there any plans to improve Lucene in terms of dealing in a principled way with this issue of combining TermQueries and NumericRangeQueries ? -- View this message in context: http://lucene.472066.n3.nabble.com/Score-combination-Filtering-vs-Querying-tp3070425p3070425.html Sent from the Lucene - General mailing list archive at Nabble.com.