I have an interesting scoring problem, which I can't seem to get around. The problem is best stated as follows:
(1) My schema has several independent fields, e.g. "value_0", "value_1", ... "value_6". (2) Every document has all of these fields set, with a-priori field norm values. Where a record has no field value, the document is indexed with a placeholder value ("_empty_"), whose field norm is the numerical average of all the a-priori field norms for that field. (3) My query takes a set of terms and builds a list of combinations of these, and Ors these combinations together. For example: Q=Lexington Massachusetts Query: (+value_0:Lexington +value_0:Massachusetts) (+value_0:Lexington +value_1:Massachusetts) (+value_1:Lexington +value_0:Massachusetts) ... The tricky part comes in when I try to explicitly add the "_empty_" matches. I need to do this because I am trying to insure that when, say, two values are matched, I preferentially score the record which has only those two values the highest, compared to the all the records that have those two values and also a third one. So, I tried this: Query: (+value_0:Lexington +value_0:Massachusetts +value_1:_empty_ +value_2:_empty_ + value_3:_empty_ + value_4:_empty_ etc.) (+value_0:Lexington +value_1:Massachusetts +value_2:_empty_ etc.) (+value_1:Lexington +value_0:Massachusetts +value_2:_empty_ etc.) ... I also needed it to be possible to match all possible values instead of _empty_ for each of the places where that occurred. Including no clause for these fields clearly messed up the queryNorm, so I fixed that by including a MatchAllDocsQuery() for each missing field, this insuring that the number of query clauses was identical from clause to clause. Nevertheless, I was still not seeing the shortest-match records being scored to the top. So I tried to boost the _empty_ matches, like this: (+value_0:Lexington +value_0:Massachusetts +value_1:_empty_^1000.0 +value_2:_empty_^1000.0 + value_3:_empty_^1000.0 + value_4:_empty_^1000.0 etc.) That, surprisingly, did not change anything. I suppose it must be because the boost is also figured into the query norm? I'm trying another experiment now, reindexing with a pre-boosted field norm for _empty_ tokens. But what I'd like to ask is, how exactly are you supposed to fix this problem in Lucene? All I want to see is the minimal complete match be scored to the top. Karl