[
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045598#comment-13045598
]
David Mark Nemeskey commented on LUCENE-3174:
---------------------------------------------
Here's what the patch does:
- it introduces the Similarity.Stats class and its subclasses
- renames computeWeight() to computeStats()
- fixes methods that call computeStats()
What remains to be done:
- rewrite the javadoc
- Stats will be used inside other Similarity methods: its availability should
be unsured somehow. The current solution in MockBM25Similarity is not
satisfactory because there is only one Similarity object at a time.
- MultiPhraseWeight, PhraseWeight, SpanWeight, TermWeight call computeStats and
extract the IDFExplain object. This level of coupling is not desirable, and
should be eliminated. All the more so, as not all Similarity subclasses will
have an idf
- It might not even make sense to expose computeStats()?
To consider:
- it might be better if Stats were static, because they could inherit fields
from each other
> Similarity.Stats class for term & collection statistics
> -------------------------------------------------------
>
> Key: LUCENE-3174
> URL: https://issues.apache.org/jira/browse/LUCENE-3174
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: core/search
> Affects Versions: flexscoring branch
> Reporter: David Mark Nemeskey
> Assignee: David Mark Nemeskey
> Priority: Minor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3174.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the
> statistics they need available. These statistics could be computed in
> computeWeight (soon to become computeStats) and stored in a separate object
> for easy access. Since this object will be used solely by subclasses of
> Similarity, it should be implented as a static inner class, i.e.
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single
> field, idf, while the one in e.g. BM25Similarity would have idf and average
> field/document length.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]