[ 
https://issues.apache.org/jira/browse/LUCENE-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shayan Tabrizi closed LUCENE-7478.
----------------------------------
    Resolution: Invalid

> Wrong Formula in LMDirichletSimilarity
> --------------------------------------
>
>                 Key: LUCENE-7478
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7478
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Shayan Tabrizi
>
> It seems that the formula in LMDirichletSimilarity is wrong or at least is 
> not the formula in the mentioned C.X. Zhai paper. 
> The main part of formula in LMDirichletSimilarity is:
> Math.log(1 + freq /
>         (mu * ((LMStats)stats).getCollectionProbability())) +
>         Math.log(mu / (docLen + mu)
> which is in fact:
> (mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu))
> while the main formula is:
> (mu*p(w|C)+c(w,d))/(|d| + mu)
> So a p(w|C) is practically added to the formula.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to