[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

Robert Muir (JIRA) Tue, 02 Aug 2011 05:43:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076171#comment-13076171
 ]


Robert Muir commented on LUCENE-3220:
-------------------------------------

Hi David, i was thinking for the norm, we could store it like 
DefaultSimilarity. this would make it especially convenient, as you could 
easily use these similarities with the same exact index as one using Lucene's 
default scoring. Also I think (not sure!) by using 1/sqrt we will get better 
quantization from smallfloat?

{noformat}
  public byte computeNorm(FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
    return encodeNormValue(state.getBoost() * ((float) (1.0 / 
Math.sqrt(numTerms))));
  }
{noformat}

for computations, you have to 'undo' the sqrt() to get the quantized length, 
but thats ok since its only done up-front a single time and tableized, so it 
won't slow anything down.


> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/query/scoring, core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc, gsoc2011
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

Reply via email to