[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Mike Klaas (JIRA) Mon, 02 Feb 2009 18:08:33 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669843#action_12669843
 ]


Mike Klaas commented on LUCENE-1534:
------------------------------------

[quote]But if we feel that over-emphasizes terms with large idfs, then we 
should not remove an idf factor from one vector, but rather rework our weight 
heuristic, perhaps replacing idf with sqrt(idf), no?[/quote]

FWIW, having implemented web search on a large (500m) corpus, we found the 
stock idf factor in lucene is too high, and ended up sqrt()'ing it in 
Similarity.

That said, much of the score in this system came from anchor text, link 
analysis scores, and term proximity, so it's hard to measure the impact the idf 
change independently.

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Reply via email to