[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

David Mark Nemeskey (JIRA) Tue, 02 Aug 2011 13:06:57 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Mark Nemeskey updated LUCENE-3220:
----------------------------------------

    Attachment: LUCENE-3220.patch

EasySimilarity now computes norms in the same way as DefaultSimilarity.

Actually not exactly the same way, as I have not yet added the discountOverlaps 
property. I think it would be a good idea for EasySimilarity as well (it is for 
phrases, right), what do you reckon?

I also wrote a quick test to see which norm (length directly or 1/sqrt) is 
closer to the original value and it seems that the direct one is usually much 
closer (RMSE is 0.09689688608375747 vs 0.23787634482532286). Of course, I know 
it is much more important that the new Similarities can use existing indices.

> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/query/scoring, core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc, gsoc2011
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

Reply via email to