[ https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579258#action_12579258 ]
Nadav Har'El commented on LUCENE-954: ------------------------------------- I hate to rain on the parade, but maybe instead of making small modifications to the way Hits works, it's time to deprecate it? Hits has numerous flaws compared to the alternative interface (Searcher.search(Query, HitCollector), with TopDocCollector). It tries to "guess" in advance the number of results it should calculate (usually calculating too many, or too few and having to run the search again). It does bizarre normalization of the score (as this patch points out). It is harder to extend the way the HitCollector interface can be (for an example, see the recently checked-in timed hit collector, replacing yet another suggest improvements to the Hits interface). So I say - it's time to deprecate the Hits search(Query) method, to change the tutorials to recommend TopDocCollector instead, and to stop trying to improve Hits. > Toggle score normalization in Hits > ---------------------------------- > > Key: LUCENE-954 > URL: https://issues.apache.org/jira/browse/LUCENE-954 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.2, 2.3, 2.3.1, 2.4 > Environment: any > Reporter: Christian Kohlschütter > Fix For: 2.4 > > Attachments: hits-scoreNorm.patch, LUCENE-954.patch > > > The current implementation of the "Hits" class sometimes performs score > normalization. > In particular, whenever the top-ranked score is bigger than 1.0, it is > normalized to a maximum of 1.0. > In this case, Hits may return different score results than TopDocs-based > methods. > In my scenario (a federated search system), Hits delievered just plain wrong > results. > I was merging results from several sources, all having homogeneous statistics > (similar to MultiSearcher, but over the Internet using HTTP/XML-based > protocols). > Sometimes, some of the sources had a top-score greater than 1, so I ended up > with garbled results. > I suggest to add a switch to enable/disable this score-normalization at > runtime. > My patch (attached) has an additional peformance benefit, since score > normalization now occurs only when Hits#score() is called, not when creating > the Hits result list. Whenever scores are not required, you save one > multiplication per retrieved hit (i.e., at least 100 multiplications with the > current implementation of Hits). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]