I have two slightly different queries, and am filtering to return only a 
single unique document. The scores are very slightly different, but in the 
opposite way from what my (naive) reasoning would have expected.

In the first case the query is
text:"j2ee"^2.0, text:"soa"^2.0, text:webservic, text:db2

Note that there are two boosted terms and two unboosted terms. The 
document, in fact, contains only the terms "db2" and "soa", and note that 
db2 is an unboosted term.
The score is 0.069.

The second query is
text:"db2"^2.0, text:"j2ee"^2.0, text:"soa"^2.0, text:webservic

In this case there are three boosted terms and one unboosted term. Note 
that now both db2 and soa are "boosted".
The score is 0.065, which is slightly smaller, which is the opposite of 
what I would expect, since I have two boosted terms now instead of just 
one.

Looking at the explanation object, the difference is entirely due to the 
queryNorm factor (which explain doesn't really "go into"). 
My next step is to get the Lucene source and try to step through to 
determine why queryNorm is larger in the first case than the second, but I 
was wondering if anyone out there can give a simple explanation for why it 
would differ for these two queries. I use the DefaultSimilarity class.

Many thanks in advance--
Donna 


Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]

Reply via email to