Robert Muir created LUCENE-6986:
-----------------------------------

             Summary: Add more DFI independence measures
                 Key: LUCENE-6986
                 URL: https://issues.apache.org/jira/browse/LUCENE-6986
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Robert Muir


Since LUCENE-6818 we have DFISimilarity which implements normalized chi-squared 
distance.

But there are other alternatives (as described in 
http://trec.nist.gov/pubs/trec21/papers/irra.web.nb.pdf):

* normalized chi-squared: "can be used for tasks that require high precision, 
against both short and long queries"
* standardized: "good at tasks that require high recall and high precision, 
especially against short queries composed of a few words as in the case of 
Internet searches"
* saturated: "for tasks that require high recall against long queries"

I think we should just provide the three independence measures, and let the 
user choose. Similar to how we do DFR/IB/etc.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to