[ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669754#action_12669754 ]
Doug Cutting commented on LUCENE-1534: -------------------------------------- sumOfSquaredWeights properly normalizes query vectors to the unit sphere. We can't easily do that with document vectors, since idfs change as the collection changes. So we instead use a heuristic to normalize documents, sqrt(numTokens), which is usually a good approximation. Regardless of how it's normalized, the global term weight factors twice in each addend, once from each vector. > idf(t) is not actually squared during scoring? > ---------------------------------------------- > > Key: LUCENE-1534 > URL: https://issues.apache.org/jira/browse/LUCENE-1534 > Project: Lucene - Java > Issue Type: Bug > Components: Query/Scoring > Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > > The javadocs for Similarity: > > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html > show idf(t) as being squared when computing net query score. But I > don't think it is actually squared, in looking at the sources? Maybe > it used to be, eg this interesting discussion: > http://markmail.org/message/k5pl7scmiac5wosb > Or am I missing something? We just need to fix the javadocs to take > away the "squared"... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org