[ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105012#comment-15105012
 ] 

Robert Muir commented on LUCENE-6818:
-------------------------------------

Thanks [~iorixxx] !

The norms/spans tests were added in LUCENE-6896.

Rather than a wildcard import, I moved RandomSimilarityProvider to 
similarities/RandomSimilarity, so its in the correct package. Its just used by 
LuceneTestCase.newSearcher.

I ran the test suite a few times to try to find any problems, and did some 
rudimentary relevance testing of the lucene impl and everything seems ok.

For the solr factory changes around discountOverlaps, can you make a separate 
issue for that? I'm concerned that, if the factory is not initialized properly, 
instead there will be other problems, so maybe that should really be an 
assertion or something.


> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-6818
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6818
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/query/scoring
>    Affects Versions: 5.3
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>              Labels: similarity
>             Fix For: 5.5, Trunk
>
>         Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, 
> LUCENE-6818.patch, LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to