[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term & collection statistics

Robert Muir (JIRA) Sat, 11 Jun 2011 04:59:59 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047889#comment-13047889
 ]


Robert Muir commented on LUCENE-3174:
-------------------------------------

{quote}
However, passing Stats to the methods you mentioned would only be possible if 
Stats already defined every possible statistic, either as public members or 
getter methods. I don't mind if it becomes like that; 
{quote}

I don't think anything needs to be in Stats itself. If i write BM25Similarity, 
then i make my own BM25Similarity.BM25Stats and put what i need in it. its 
passed to my docscorer as Stats and I cast to BM25Stats...done.

{quote}
Also, I am thinking of leaving idf out of Stats in favor of df, and doing the 
computation in the DocScorers. This would make it possible to reuse the same 
Stats object e.g. for composite Similarities.
{quote}

I would disagree in this case, i think a composite similarity that has N 
sub-similarities would just return a MultiStats that keeps these in an array, 
as this composite doesnt care at all whats in them, it just needs to be able to 
delegate them back to the sub's docscorers later.


> Similarity.Stats class for term & collection statistics
> -------------------------------------------------------
>
>                 Key: LUCENE-3174
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3174
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-3174.patch, LUCENE-3174.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the 
> statistics they need available. These statistics could be computed in 
> computeWeight (soon to become computeStats) and stored in a separate object 
> for easy access. Since this object will be used solely by subclasses of 
> Similarity, it should be implented as a static inner class, i.e. 
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In 
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single 
> field, idf, while the one in e.g. BM25Similarity would have idf and average 
> field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term & collection statistics

Reply via email to