[ 
https://issues.apache.org/jira/browse/LUCENE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340522#comment-15340522
 ] 

Michael McCandless commented on LUCENE-7347:
--------------------------------------------

bq. but I don't see the relationship between term saturation and coord.

The "problem" with TF/IDF is that if a single term out of the N terms in your 
boolean query occurs many times in a document, it drastically increases the 
score because its term saturation is "weak": {{sqrt(termFreq)}}.  If the query 
is {{x OR y}} and a document has 1000 x's and 0 y's, TF/IDF gives it a great 
score, even though y never occurred.  And so coord tries to counteract that 
behavior.

Whereas BM 25 has much stronger term saturation, controlled by its {{k1}} 
parameter, such that a single term in your query occurring many times does not 
increase the score nearly as much as another term going from freq 0 to freq 1.  
BM 25 naturally favors documents that had at least one occurrence of more of 
the requested query terms.  So a document with only like 5 x's and 1 y, or 
something, will naturally get a better score than the first document with 1000 
x's and 0 y's.

> Remove queryNorm and coords
> ---------------------------
>
>                 Key: LUCENE-7347
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7347
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>
> These two features are specific to TF-IDF and introduce some complexity (see 
> eg. handling of coords in BooleanWeight) and bugs/corner-cases (see eg. how 
> taking the query norm into account causes scoring challenges on LUCENE-7337).
> Since we made BM25 the default in 6.0, I propose that we remove these 
> TF-IDF-specific features in 7.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to