[ 
https://issues.apache.org/jira/browse/SOLR-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8057:
---------------------------
    Attachment: SOLR-8057.patch

really barebones Strawman patch for trunk, still needs a lot of tests for the 
new conditional behavior...

{panel}
* adds ClassicSimilarityFactory
* changes SweetSpotSimilarityFactory to extend ClassicSimilarityFactory
* updates DefaultSimilarityFactory and SchemaSimilarityFactory to make 
"default" Sim conditional on luceneVersionmatch
** currently does this by making DefaultSimilarityFactory SolrCoreAware - not 
sure if this should really be needed, see nocommit related to some dead code in 
IndexSchema that looks like it was intended to pass luceneMatchVersion to 
SimilarityFactory.init() ... need to investigate more
{panel}

There's a decent number of existing test failures.  Most seem to be based on 
hardcoded assumptions about exact score values that are diff between ClassicSim 
and BM25 -- I'll audit those.

One concerning failure is from 
BadIndexSchemaTest.testPerFieldtypeSimButNoSchemaSimFactory.  The javadocs say 
that "IndexSchema will provide such error checking if a non-SchemaAware 
instance of SimilarityFactory" but as soon as i made DefaultSimilarityFactory 
implement SolrCoreAware (*NOT* SchemaAware) this seems to have broken ... which 
seems like a tangentially related bug uncovered by this change.


> Change default Sim to BM25 (w/backcompat config handling)
> ---------------------------------------------------------
>
>                 Key: SOLR-8057
>                 URL: https://issues.apache.org/jira/browse/SOLR-8057
>             Project: Solr
>          Issue Type: Task
>            Reporter: Hoss Man
>            Priority: Blocker
>             Fix For: Trunk
>
>         Attachments: SOLR-8057.patch
>
>
> LUCENE-6789 changed the default similarity for IndexSearcher to BM25 and 
> renamed "DefaultSimilarity" to "ClassicSimilarity"
> Solr needs to be updated accordingly:
> * a "ClassicSimilarityFactory" should exist w/expected behavior/javadocs
> * default behavior (in 6.0) when no similarity is specified in configs should 
> (ultimately) use BM25 depending on luceneMatchVersion
> ** either by assuming BM25SimilarityFactory or by changing the internal 
> behavior of DefaultSimilarityFactory
> * comments in sample configs need updated to reflect new default behavior
> * ref guide needs updated anywhere it mentions/implies that a particular 
> similarity is used (or implies TF-IDF is used by default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to