[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053329#comment-13053329
 ] 

Robert Muir commented on LUCENE-3220:
-------------------------------------

Just took a look, a few things that might help:

* yes the maxdoc does not reflect deletions, but neither does things like 
totalTermFreq or docFreq either... so its best to not worry about deletions in 
the scoring and to be consistent and use the stats (e.g. maxDoc, not numDocs) 
that do not take deletions into account.

* for the computeStats(TermContext... termContexts) its wierd to sum the DF 
across the different terms in the case? But i don't honestly have any 
suggestions here... maybe in this case we should make a EasyPhraseStats that 
computes the EasyStats for each term, so its not hiding anything or limiting 
anyone? and you could then do an instanceof check and have a different method 
like scorePhrase() that it forwards to in case its an EasyPhraseStats? In 
general i'm not sure how other ranking systems tend to handle this case, the 
phrase estimation for IDF in lucene's formula is done by summing the IDFs


> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to