[ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669843#action_12669843 ]
Mike Klaas commented on LUCENE-1534: ------------------------------------ [quote]But if we feel that over-emphasizes terms with large idfs, then we should not remove an idf factor from one vector, but rather rework our weight heuristic, perhaps replacing idf with sqrt(idf), no?[/quote] FWIW, having implemented web search on a large (500m) corpus, we found the stock idf factor in lucene is too high, and ended up sqrt()'ing it in Similarity. That said, much of the score in this system came from anchor text, link analysis scores, and term proximity, so it's hard to measure the impact the idf change independently. > idf(t) is not actually squared during scoring? > ---------------------------------------------- > > Key: LUCENE-1534 > URL: https://issues.apache.org/jira/browse/LUCENE-1534 > Project: Lucene - Java > Issue Type: Bug > Components: Query/Scoring > Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > > The javadocs for Similarity: > > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html > show idf(t) as being squared when computing net query score. But I > don't think it is actually squared, in looking at the sources? Maybe > it used to be, eg this interesting discussion: > http://markmail.org/message/k5pl7scmiac5wosb > Or am I missing something? We just need to fix the javadocs to take > away the "squared"... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org