[ 
https://issues.apache.org/jira/browse/SOLR-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990072#comment-14990072
 ] 

Hoss Man commented on SOLR-8057:
--------------------------------


The more I work on this and think about it, the more I think my current 
approach of putting luceneMatchVersion conditional logic in DefaultSimFactory 
is the wrong way to go (independent of the bugs that i seem to have uncovered 
in making a SimFactories SolrCoreAware - which i'll confirm & file seperately) 
...

I'm starting to think that a better long term solution would be to split this 
up into 3 discrete tasks/ideas...

{panel:title=Task #1 - Deprecate/rename DefaultSimilarityFactory in 5.x}
* clone DefaultSimilarityFactory -> ClassicSimilarityFactory
* prune DefaultSimilarityFactory down to a trivial subclass of 
ClassicSimilarityFactory
** make it log a warning on init
* change default behavior of IndexSchema to use ClassicSimilarityFactory 
directly
* mark DefaultSimilarityFactory as deprecated in 5.x, remove from trunk/6.0
{panel}

Task #1 would put us in a better position moving forward of having the facotry 
names directly map to the underlying implementation, leaving less ambiguity 
when an explicit factory is specified in the schema.xml (either as the main 
similarity, or as a per field similarity)

{panel:title="Task #2 - Make the wrapped per-field default in 
SchemaSimilarityFactory conditional on luceneMatchVersion"}
* use ClassicSimilarity as per-field default when luceneMatchVersion < 6.0
* use BM25Similarity as per-field default when luceneMatchVersion < 6.0
{panel}

Task #2 would give us better defaults (via BM25) for people using 
SchemaSimilarityFactory moving forward, while existing users would have no back 
compat change.

{panel:title=Task #3 - Change the implicit default Similarity on trunk}
* make the Similariy init logic in IndexSchema conditional on luceneMatchVersion
* use ClassicSimilarityFactory as default when luceneMatchVersion < 6.0
* *use SchemaSimilarityFactory as default when luceneMatchVersion >= 6.0*
** combined with Task #2, this would mean the wrapped per-field default would 
be BM25
{panel}

Task #3 is where things start to get noticibly diff from the goals i outlined 
when i originally filed this jira...

As far as i can tell, the chief reason SchemaSimilarityFactory wasn't made the 
implicit default in IndexSchema when it was introduced is because of how it 
differed/differs from DefaultSimilarity/ClassicSimilarity with respect to 
multi-clause queries -- see SchemaSimilarityFactory's class javadoc notes 
relating to {{queryNorm}} and {{coord}}.  Users were expected to think about 
this trade off when making a concious choice to switch from 
DefaultSimilarity/ClassicSimilarity to SchemaSimilarityFactory.  But (again, 
AFAICT) these discrepencies don't exist between SchemaSimilarityFactory's 
PerFieldSimilarityWrapper and BM25Similiarity.   So if we want to make 
BM25Similiarity the default when luceneMatchVersion >= 6.0, there doesn't seem 
to be any downside to _actually_ making SchemaSimilarityFactory (wrapping 
BM25Similiarity) the default instead.

----

Task #1 seems like a no brainer to me, and likeise Task #2 seems like a 
sensible change balancing new user experience vs backcompat -- so i'm going to 
go ahead and move forward with individual sub-tasks to tackle those (in that 
order).

If there are no concerns/objections to Task #3 by the time I get to that point, 
and if i haven't changed my mind that it's a good idea, I'll move forward with 
that as well -- the alternative is to stick with the original plan and make 
BM25SimilarityFactory (directly) the default when luceneMatchVersion >= 6.0.


> Change default Sim to BM25 (w/backcompat config handling)
> ---------------------------------------------------------
>
>                 Key: SOLR-8057
>                 URL: https://issues.apache.org/jira/browse/SOLR-8057
>             Project: Solr
>          Issue Type: Task
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Blocker
>             Fix For: Trunk
>
>         Attachments: SOLR-8057.patch, SOLR-8057.patch
>
>
> LUCENE-6789 changed the default similarity for IndexSearcher to BM25 and 
> renamed "DefaultSimilarity" to "ClassicSimilarity"
> Solr needs to be updated accordingly:
> * a "ClassicSimilarityFactory" should exist w/expected behavior/javadocs
> * default behavior (in 6.0) when no similarity is specified in configs should 
> (ultimately) use BM25 depending on luceneMatchVersion
> ** either by assuming BM25SimilarityFactory or by changing the internal 
> behavior of DefaultSimilarityFactory
> * comments in sample configs need updated to reflect new default behavior
> * ref guide needs updated anywhere it mentions/implies that a particular 
> similarity is used (or implies TF-IDF is used by default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to