[
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047928#comment-13047928
]
David Mark Nemeskey commented on LUCENE-3174:
---------------------------------------------
bq. I would disagree in this case, i think a composite similarity that has N
sub-similarities would just return a MultiStats that keeps these in an array,
as this composite doesnt care at all whats in them, it just needs to be able to
delegate them back to the sub's docscorers later.
I didn't think of that. I really like this idea.
As for Stats, I see several advantages of a single class:
- no need for casting. It may be just me, but having to cast everywhere doesn't
feel right for me;
- we show in one place what statistics the ranking algorithms use, the user of
the library doesn't need to "hunt" for this information;
- I think there will be Similarities that use the same Stats subclass, e.g.
MockLMSimilarity uses TFIDFSimilarity.IDFStats. Or it could define its own
Stats that looks exactly the same. Either solution seems a bit strange for me;
- one less class to write if you want to add a new Similarity (provided you
don't need a new statistic, in which case you have to write your own and cast
it).
> Similarity.Stats class for term & collection statistics
> -------------------------------------------------------
>
> Key: LUCENE-3174
> URL: https://issues.apache.org/jira/browse/LUCENE-3174
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: core/search
> Affects Versions: flexscoring branch
> Reporter: David Mark Nemeskey
> Assignee: David Mark Nemeskey
> Priority: Minor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3174.patch, LUCENE-3174.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the
> statistics they need available. These statistics could be computed in
> computeWeight (soon to become computeStats) and stored in a separate object
> for easy access. Since this object will be used solely by subclasses of
> Similarity, it should be implented as a static inner class, i.e.
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single
> field, idf, while the one in e.g. BM25Similarity would have idf and average
> field/document length.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]