[
https://issues.apache.org/jira/browse/LUCENE-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shayan Tabrizi closed LUCENE-7478.
----------------------------------
Resolution: Invalid
> Wrong Formula in LMDirichletSimilarity
> --------------------------------------
>
> Key: LUCENE-7478
> URL: https://issues.apache.org/jira/browse/LUCENE-7478
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Shayan Tabrizi
>
> It seems that the formula in LMDirichletSimilarity is wrong or at least is
> not the formula in the mentioned C.X. Zhai paper.
> The main part of formula in LMDirichletSimilarity is:
> Math.log(1 + freq /
> (mu * ((LMStats)stats).getCollectionProbability())) +
> Math.log(mu / (docLen + mu)
> which is in fact:
> (mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu))
> while the main formula is:
> (mu*p(w|C)+c(w,d))/(|d| + mu)
> So a p(w|C) is practically added to the formula.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]