[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term & collection statistics

Robert Muir (JIRA) Sat, 11 Jun 2011 09:35:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047941#comment-13047941
 ]


Robert Muir commented on LUCENE-3174:
-------------------------------------

Right, but there are some serious disadvantages:

* we don't know what custom stats someone might want to integrate... it could 
be computed based however they want.
* we might add newer stats, but keeping it opaque to the sim relieves us of 
backwards compatibility hassles.

for this reason i would like it completely opaque... its not like you are gonna 
have to cast everywhere, only a 
single time when creating your docscorer.

I agree with you that for researchers, they want to see all the ordinary IR 
stats available like terrier or whatever
We should just make an EasySimilarity extends Similarity for them, that returns 
AllStats with all these common ones.

But I think other people are going to want to extend lucene to their domain so 
keeping it opaque is best, so they can do this.


> Similarity.Stats class for term & collection statistics
> -------------------------------------------------------
>
>                 Key: LUCENE-3174
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3174
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-3174.patch, LUCENE-3174.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the 
> statistics they need available. These statistics could be computed in 
> computeWeight (soon to become computeStats) and stored in a separate object 
> for easy access. Since this object will be used solely by subclasses of 
> Similarity, it should be implented as a static inner class, i.e. 
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In 
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single 
> field, idf, while the one in e.g. BM25Similarity would have idf and average 
> field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term & collection statistics

Reply via email to