[ 
https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-7997:
--------------------------------
    Attachment: LUCENE-7997_wip.patch

Updated patch, also enforcing that explain == score (exactly, no floating point 
differences). 

I cleaned up the BM25 explain to be transparent and reflect how the calculation 
is done.
Most importantly, explanation is now broken out as {{scaling * df * tf}}, like 
how we compute it, and described in 
http://kak.tx0.org/Information-Retrieval/TFxIDF rather than displaying the 
"re-arranged formula" with tf including the {{k1 + 1}} scaling factor. Maybe 
its an improvement for debugging, too since it pulls out the independent 
scaling factor, making it easier to see the specifics of term frequency 
saturation and IDF across docs/terms?

> More sanity testing of similarities
> -----------------------------------
>
>                 Key: LUCENE-7997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7997
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, 
> LUCENE-7997_wip.patch
>
>
> LUCENE-7993 is a potential optimization that we could only apply if the 
> similarity is an increasing functions of {{freq}} (all other things like DF 
> and length being equal). This sounds like a very reasonable requirement for a 
> similarity, so we should test it in the base similarity test case and maybe 
> move broken similarities to sandbox?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to