[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

David Mark Nemeskey (JIRA) Sun, 10 Jul 2011 07:44:23 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Mark Nemeskey updated LUCENE-3220:
----------------------------------------

    Attachment: LUCENE-3220.patch

 * Fixed #1
 * Added a totalBoost to EasySimilarity, and a getter method -- noone uses it 
yet
 * Added basic implementations for the Jelinek-Mercer and the Dirichlet LM 
methods.

As for the last one: the implementation is very basic now, I want to factor a 
few things out (e.g. p(w|C) to LMStats, possibly in a pluggable way so ppl can 
implement it however they want). It also doesn't seem right to have the same LM 
method implemented twice (both as MockLMSimilarity and here), so I'll take a 
look to see if I can merge those two. Finally, I am wondering whether I should 
implement the absolute discounting method, which, according to the paper, seems 
inferior to the Jelinek-Mercer and Dirichlet methods. Right now I am more on 
the "no" side.

> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

Reply via email to