Similarity.Stats class for term & collection statistics
-------------------------------------------------------
Key: LUCENE-3174
URL: https://issues.apache.org/jira/browse/LUCENE-3174
Project: Lucene - Java
Issue Type: Sub-task
Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
In order to support ranking methods besides TF-IDF, we need to make the
statistics they need available. These statistics could be computed in
computeWeight (soon to become computeStats) and stored in a separate object for
easy access. Since this object will be used solely by subclasses of Similarity,
it should be implented as a static inner class, i.e. Similarity.Stats.
There are two ways this could be implemented:
- as a single Similarity.Stats class, reused by all ranking algorithms. In this
case, this class would have a member field for all statistics;
- as a hierarchy of Stats classes, one for each ranking algorithm. Each
subclass would define only the statistics needed for the ranking algorithm.
In the second case, the Stats class in DefaultSimilarity would have a single
field, idf, while the one in e.g. BM25Similarity would have idf and average
field/document length.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]