[ 
https://issues.apache.org/jira/browse/LUCENE-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972833#comment-15972833
 ] 

ASF GitHub Bot commented on LUCENE-7498:
----------------------------------------

GitHub user alessandrobenedetti opened a pull request:

    https://github.com/apache/lucene-solr/pull/191

    Lucene-7498

    This Pull Request related to the JIRA issue : LUCENE-7498
    
    It involves the introduction of a big refactor of the More Like This module 
and the introduction of the BM25 similarity.
    
    It is not supposed to be a final patch but to put the basis for a big 
improvement in the More Like This code base.
    Any feedback is welcome
    
    **Summary**
    MoreLikeThis becomes a facade, just to expose the main More Like This 
functionality.
    Responsibility are now separated in :
    - Interesting Terms retriever ( from a docId in the index or from a Lucene 
Document passed in input)
    - Scorer ( to identify how much a term is interesting : BM25 and TFIDF 
supported
    - Mlt query builder ( to build the query from the interesting terms)
    
    Every component is specifically tested.
    
    The modification impact as a side effect : 
    
    **Classification**
    Knn CLassifiers to use the refactored More Like This
    Knn query in Lucene will be slightly different
    
    **Single Solr Instance**
    The refactored MLT usage by Solr
    
    **SolrCloud**
    The refactored MLT usage by SolrCloud
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/alessandrobenedetti/lucene-solr lucene-7498

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #191
    
----
commit 5c2648aff8258472105fd1e85df806f4871d8c98
Author: Alessandro Benedetti <[email protected]>
Date:   2017-02-06T22:48:41Z

    [LUCENE-7498] initial patch

commit 562fb48acfe3cbf5df62c3818b89ab7904aa52a9
Author: Alessandro Benedetti <[email protected]>
Date:   2017-02-06T23:09:57Z

    [LUCENE-7498] minor fix in field names with boost analysis

commit 061ca863a9f2fadd0ba996c9041cc720128a127b
Author: Alessandro Benedetti <[email protected]>
Date:   2017-02-06T23:32:56Z

    [LUCENE-7498] original test was not correct, fixed

----


> More Like This to Use BM25
> --------------------------
>
>                 Key: LUCENE-7498
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7498
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>            Reporter: Alessandro Benedetti
>
> BM25 is now the default similarity, but the more like this is still using the 
> old TF/IDF .
>  
> This issue is to move to BM25 and refactor the MLT to be more organised, 
> extensible and maintainable.
> Few extensions will follow later, but the focus of this issue will be :
>  - BM25
> - code refactor + tests



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to