[ 
https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571367#action_12571367
 ] 

Christian Kohlschütter commented on LUCENE-954:
-----------------------------------------------

Grant,

sorry I was perhaps not too clear about it.

The distribution of scores of one Hits instance is currently not comparable to 
another distribution of scores of another Hits object, even if the underlying 
statistics are comparable/compatible/identical. This is due to the case that 
the values are always normalized to a maximum of 1.0.

As I said, my Federated Search system provides homogeneous statistics (but not 
via MultiSearcher). In fact, it does not use MultiSearcher for this, but a 
variant of the SRU/SRW/XCQL protocols ("SRX/FS"), where all communication is 
done via HTTP and XML. This includes the exchange of Term/DF statistics. At the 
end, the system makes several distributed Indexes appear as a single (read: 
federated) index. In order to merge the results from each index, Hits is used.

In the simplest case, the results from every Hits object (one per source) are 
simply merged by score in descending order. With the current implementation of 
Lucene Hits, these scores are not comparable across instances. With the patch, 
they are (at least when score normalization is turned off).

If you need more information about the Federated Search system, we can indeed 
move the discussion to the mailing list. However, I think the problem is not 
really specific to my needs. Even if you have two Hits instances locally, you 
might want to be able to compare the scores (or merge the results) from Hits 
instance A to those from Hits instance B (in particular, when they are from the 
same index). This is also not possible right now.


> Toggle score normalization in Hits
> ----------------------------------
>
>                 Key: LUCENE-954
>                 URL: https://issues.apache.org/jira/browse/LUCENE-954
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>         Environment: any
>            Reporter: Christian Kohlschütter
>         Attachments: hits-scoreNorm.patch
>
>
> The current implementation of the "Hits" class sometimes performs score 
> normalization.
> In particular, whenever the top-ranked score is bigger than 1.0, it is 
> normalized to a maximum of 1.0.
> In this case, Hits may return different score results than TopDocs-based 
> methods.
> In my scenario (a federated search system), Hits delievered just plain wrong 
> results.
> I was merging results from several sources, all having homogeneous statistics 
> (similar to MultiSearcher, but over the Internet using HTTP/XML-based 
> protocols).
> Sometimes, some of the sources had a top-score greater than 1, so I ended up 
> with garbled results.
> I suggest to add a switch to enable/disable this score-normalization at 
> runtime.
> My patch (attached) has an additional peformance benefit, since score 
> normalization now occurs only when Hits#score() is called, not when creating 
> the Hits result list. Whenever scores are not required, you save one 
> multiplication per retrieved hit (i.e., at least 100 multiplications with the 
> current implementation of Hits).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to