[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term & collection statistics

Robert Muir (JIRA) Mon, 13 Jun 2011 07:24:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048575#comment-13048575
 ]


Robert Muir commented on LUCENE-3174:
-------------------------------------

{quote}
Almost completely removed idf from the Weights – it still lingers in explain(). 
{quote}

Right, explain() is a big TODO of a refactoring job, you did the right thing, 
its not easily solved until we refactor it big-time so that any arbitrary 
Similarity can explain its own scoring. Not to make any promises, but I think 
by doing such a thing (letting a Similarity control how the explaining works), 
we will make progress towards LUCENE-3118 too: if you customize the scoring 
system for your app, you should be able to explain the scores in a way that 
make sense to your app too.

{quote}
The DocScorer factory methods now need both the Weight and the Stats; that's 
the best I could do for now.
{quote}

This sounds like a good step to me! We want to just pass only the Stats to the 
DocScorer factory methods, but we have some more work to do before that... such 
as better handling of the whole boosting situation and pushing all 
responsibility for query normalization into stats.

once we have done this, i think Weight/Stats will make sense (except for 
naming) as it will be be the parallel of Scorer/DocScorer, full responsibility 
for scoring is in the Similarity and Weight/Scorer only handle things like 
seeking to terms, creating docsenums, iterating postings lists, etc :)



> Similarity.Stats class for term & collection statistics
> -------------------------------------------------------
>
>                 Key: LUCENE-3174
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3174
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
> LUCENE-3174_normalize_boost.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the 
> statistics they need available. These statistics could be computed in 
> computeWeight (soon to become computeStats) and stored in a separate object 
> for easy access. Since this object will be used solely by subclasses of 
> Similarity, it should be implented as a static inner class, i.e. 
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In 
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single 
> field, idf, while the one in e.g. BM25Similarity would have idf and average 
> field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term & collection statistics

Reply via email to