[ 
https://issues.apache.org/jira/browse/LUCENE-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039695#comment-16039695
 ] 

Scott Stults commented on LUCENE-6603:
--------------------------------------

Has this been made obsolete by LUCENE-7369?

> consider restrictions on what Similarity.coord() can return
> -----------------------------------------------------------
>
>                 Key: LUCENE-6603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6603
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> Today, Similarity.coord() can really return anything, though all of our 
> similarities are well-behaved, issues are all about custom ones:
> {code}
>    * @param overlap the number of query terms matched in the document
>    * @param maxOverlap the total number of terms in the query
>    * @return a score factor based on term overlap with the query
>    */
>   public float coord(int overlap, int maxOverlap)
> {code}
> But problems arise when a custom similarity implements {{coord(1, 1)}} to 
> return something crazy (say 2). In this case their coord impl is ignored (see 
> LUCENE-4300). {{coord(1,1)}} is always treated as 1, or things make no sense, 
> for example A NOT B would score differently from A (which would be a simple 
> termquery and have no BQ around it). Same goes with filters, which should not 
> change scoring but would, if we didn't mandate this.
> Now we see the same problem again, with LUCENE-6585. For this optimization to 
> work in the current world, it will have to check if {{coord(N,N)}} == 1 in 
> order to do it safely. Otherwise it cannot safely collapse conjunctions for 
> such custom similarities.
> I would like to enforce that {{coord(N,N)}} is always treated as 1, to 
> prevent all these crazy codepaths for wierd, not-so-well-tested cases. So we 
> could change the current hack in BooleanWeight.coord():
> {code}
> --   } else if (maxOverlap == 1) {
> ++   } else if (overlap == maxOverlap) {
>    return 1f;
> {code}
> Alternatively though, we could change javadocs of Similarity.coord() and add 
> a check to BooleanWeight to throw an exception.
> In either case, doing this would be a break, because it would break some 
> custom sims out there, at least this one 
> (https://mail-archives.apache.org/mod_mbox/lucene-java-user/201208.mbox/%[email protected]%3E).
>  That one returns {{1/overlap}} which is kinda like averaging the scores, and 
> caused us to look into this and open bugs like LUCENE-4297 and LUCENE-4300 
> and probably others. But perhaps coord() is not the way such things should be 
> done and there is another way instead, to allow BQ to be simpler and more 
> efficient.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to