Robert Muir created LUCENE-6986:
-----------------------------------
Summary: Add more DFI independence measures
Key: LUCENE-6986
URL: https://issues.apache.org/jira/browse/LUCENE-6986
Project: Lucene - Core
Issue Type: Improvement
Reporter: Robert Muir
Since LUCENE-6818 we have DFISimilarity which implements normalized chi-squared
distance.
But there are other alternatives (as described in
http://trec.nist.gov/pubs/trec21/papers/irra.web.nb.pdf):
* normalized chi-squared: "can be used for tasks that require high precision,
against both short and long queries"
* standardized: "good at tasks that require high recall and high precision,
especially against short queries composed of a few words as in the case of
Internet searches"
* saturated: "for tasks that require high recall against long queries"
I think we should just provide the three independence measures, and let the
user choose. Similar to how we do DFR/IB/etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]