[jira] Commented: (LUCENE-954) Toggle score normalization in Hits

Grant Ingersoll (JIRA) Fri, 22 Feb 2008 03:33:04 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571363#action_12571363
 ]


Grant Ingersoll commented on LUCENE-954:
----------------------------------------

{quote}
This is why the scores from Hits are currently not comparable to each other
{quote}

Do you mean across queries or within queries?  Even if you have raw scores, 
they still won't be comparable across queries, or at least that is my 
understanding of the literature.  In your original case of federated search 
across several sources, each with their own stats it is not well understood 
what scores mean.  Not saying it can't be done, it really is the only way to do 
federated search, just not sure one can try to read too much into the scores.  
Of course, this is more of a discussion for the user list than a JIRA issue, so 
I'd be happy to discuss more there and hear other thoughts.  It has been a 
while since I have read anything on it.

That also isn't to say that your patch isn't worthwhile, just wondering whether 
the change is actually meaningful for your use case.

> Toggle score normalization in Hits
> ----------------------------------
>
>                 Key: LUCENE-954
>                 URL: https://issues.apache.org/jira/browse/LUCENE-954
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>         Environment: any
>            Reporter: Christian Kohlschütter
>         Attachments: hits-scoreNorm.patch
>
>
> The current implementation of the "Hits" class sometimes performs score 
> normalization.
> In particular, whenever the top-ranked score is bigger than 1.0, it is 
> normalized to a maximum of 1.0.
> In this case, Hits may return different score results than TopDocs-based 
> methods.
> In my scenario (a federated search system), Hits delievered just plain wrong 
> results.
> I was merging results from several sources, all having homogeneous statistics 
> (similar to MultiSearcher, but over the Internet using HTTP/XML-based 
> protocols).
> Sometimes, some of the sources had a top-score greater than 1, so I ended up 
> with garbled results.
> I suggest to add a switch to enable/disable this score-normalization at 
> runtime.
> My patch (attached) has an additional peformance benefit, since score 
> normalization now occurs only when Hits#score() is called, not when creating 
> the Hits result list. Whenever scores are not required, you save one 
> multiplication per retrieved hit (i.e., at least 100 multiplications with the 
> current implementation of Hits).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-954) Toggle score normalization in Hits

Reply via email to