[
https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-7997:
--------------------------------
Attachment: LUCENE-7997_wip.patch
Updated patch that also tests floating point tf values. We assume a
computeSlopFactor has the range {{(0 .. 1]}} for testing. This found a leftover
buggy float cast in DFR {{I(F)}} but also a new bug: Axiomatic model F1 will
most likely return NaN values if you use SloppyPhraseQuery! frequency values <
1 cause its first log to go negative, then the next log to go NaN: formula is
{{1 + log(1 + log(freq))}}. Imagine freq=0.3, this is {{1 + log(1 + -1.2)}} =
{{1 + log(-0.2)}} = NaN. If we alter the formula to use {{log(1 + freq)}} then
tests pass but needs investigation/may not be an appropriate solution, so i
marked AwaitsFix for now.
> More sanity testing of similarities
> -----------------------------------
>
> Key: LUCENE-7997
> URL: https://issues.apache.org/jira/browse/LUCENE-7997
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch,
> LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch,
> LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch,
> LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch,
> LUCENE-7997_wip.patch
>
>
> LUCENE-7993 is a potential optimization that we could only apply if the
> similarity is an increasing functions of {{freq}} (all other things like DF
> and length being equal). This sounds like a very reasonable requirement for a
> similarity, so we should test it in the base similarity test case and maybe
> move broken similarities to sandbox?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]