[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mark Nemeskey updated LUCENE-3220: ---------------------------------------- Attachment: LUCENE-3220.patch EasySimilarity now computes norms in the same way as DefaultSimilarity. Actually not exactly the same way, as I have not yet added the discountOverlaps property. I think it would be a good idea for EasySimilarity as well (it is for phrases, right), what do you reckon? I also wrote a quick test to see which norm (length directly or 1/sqrt) is closer to the original value and it seems that the direct one is usually much closer (RMSE is 0.09689688608375747 vs 0.23787634482532286). Of course, I know it is much more important that the new Similarities can use existing indices. > Implement various ranking models as Similarities > ------------------------------------------------ > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search > Affects Versions: flexscoring branch > Reporter: David Mark Nemeskey > Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org