[
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076171#comment-13076171
]
Robert Muir commented on LUCENE-3220:
-------------------------------------
Hi David, i was thinking for the norm, we could store it like
DefaultSimilarity. this would make it especially convenient, as you could
easily use these similarities with the same exact index as one using Lucene's
default scoring. Also I think (not sure!) by using 1/sqrt we will get better
quantization from smallfloat?
{noformat}
public byte computeNorm(FieldInvertState state) {
final int numTerms;
if (discountOverlaps)
numTerms = state.getLength() - state.getNumOverlap();
else
numTerms = state.getLength();
return encodeNormValue(state.getBoost() * ((float) (1.0 /
Math.sqrt(numTerms))));
}
{noformat}
for computations, you have to 'undo' the sqrt() to get the quantized length,
but thats ok since its only done up-front a single time and tableized, so it
won't slow anything down.
> Implement various ranking models as Similarities
> ------------------------------------------------
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: core/query/scoring, core/search
> Affects Versions: flexscoring branch
> Reporter: David Mark Nemeskey
> Assignee: David Mark Nemeskey
> Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
> LUCENE-3220.patch, LUCENE-3220.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we
> can finally work on implementing the standard ranking models. Currently DFR,
> BM25 and LM are on the menu.
> Done:
> * {{EasyStats}}: contains all statistics that might be relevant for a
> ranking algorithm
> * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the
> DocScorers and as much implementation detail as possible
> * _BM25_: the current "mock" implementation might be OK
> * _LM_
> * _DFR_
> * The so-called _Information-Based Models_
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]