[
https://issues.apache.org/jira/browse/SOLR-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990072#comment-14990072
]
Hoss Man commented on SOLR-8057:
--------------------------------
The more I work on this and think about it, the more I think my current
approach of putting luceneMatchVersion conditional logic in DefaultSimFactory
is the wrong way to go (independent of the bugs that i seem to have uncovered
in making a SimFactories SolrCoreAware - which i'll confirm & file seperately)
...
I'm starting to think that a better long term solution would be to split this
up into 3 discrete tasks/ideas...
{panel:title=Task #1 - Deprecate/rename DefaultSimilarityFactory in 5.x}
* clone DefaultSimilarityFactory -> ClassicSimilarityFactory
* prune DefaultSimilarityFactory down to a trivial subclass of
ClassicSimilarityFactory
** make it log a warning on init
* change default behavior of IndexSchema to use ClassicSimilarityFactory
directly
* mark DefaultSimilarityFactory as deprecated in 5.x, remove from trunk/6.0
{panel}
Task #1 would put us in a better position moving forward of having the facotry
names directly map to the underlying implementation, leaving less ambiguity
when an explicit factory is specified in the schema.xml (either as the main
similarity, or as a per field similarity)
{panel:title="Task #2 - Make the wrapped per-field default in
SchemaSimilarityFactory conditional on luceneMatchVersion"}
* use ClassicSimilarity as per-field default when luceneMatchVersion < 6.0
* use BM25Similarity as per-field default when luceneMatchVersion < 6.0
{panel}
Task #2 would give us better defaults (via BM25) for people using
SchemaSimilarityFactory moving forward, while existing users would have no back
compat change.
{panel:title=Task #3 - Change the implicit default Similarity on trunk}
* make the Similariy init logic in IndexSchema conditional on luceneMatchVersion
* use ClassicSimilarityFactory as default when luceneMatchVersion < 6.0
* *use SchemaSimilarityFactory as default when luceneMatchVersion >= 6.0*
** combined with Task #2, this would mean the wrapped per-field default would
be BM25
{panel}
Task #3 is where things start to get noticibly diff from the goals i outlined
when i originally filed this jira...
As far as i can tell, the chief reason SchemaSimilarityFactory wasn't made the
implicit default in IndexSchema when it was introduced is because of how it
differed/differs from DefaultSimilarity/ClassicSimilarity with respect to
multi-clause queries -- see SchemaSimilarityFactory's class javadoc notes
relating to {{queryNorm}} and {{coord}}. Users were expected to think about
this trade off when making a concious choice to switch from
DefaultSimilarity/ClassicSimilarity to SchemaSimilarityFactory. But (again,
AFAICT) these discrepencies don't exist between SchemaSimilarityFactory's
PerFieldSimilarityWrapper and BM25Similiarity. So if we want to make
BM25Similiarity the default when luceneMatchVersion >= 6.0, there doesn't seem
to be any downside to _actually_ making SchemaSimilarityFactory (wrapping
BM25Similiarity) the default instead.
----
Task #1 seems like a no brainer to me, and likeise Task #2 seems like a
sensible change balancing new user experience vs backcompat -- so i'm going to
go ahead and move forward with individual sub-tasks to tackle those (in that
order).
If there are no concerns/objections to Task #3 by the time I get to that point,
and if i haven't changed my mind that it's a good idea, I'll move forward with
that as well -- the alternative is to stick with the original plan and make
BM25SimilarityFactory (directly) the default when luceneMatchVersion >= 6.0.
> Change default Sim to BM25 (w/backcompat config handling)
> ---------------------------------------------------------
>
> Key: SOLR-8057
> URL: https://issues.apache.org/jira/browse/SOLR-8057
> Project: Solr
> Issue Type: Task
> Reporter: Hoss Man
> Assignee: Hoss Man
> Priority: Blocker
> Fix For: Trunk
>
> Attachments: SOLR-8057.patch, SOLR-8057.patch
>
>
> LUCENE-6789 changed the default similarity for IndexSearcher to BM25 and
> renamed "DefaultSimilarity" to "ClassicSimilarity"
> Solr needs to be updated accordingly:
> * a "ClassicSimilarityFactory" should exist w/expected behavior/javadocs
> * default behavior (in 6.0) when no similarity is specified in configs should
> (ultimately) use BM25 depending on luceneMatchVersion
> ** either by assuming BM25SimilarityFactory or by changing the internal
> behavior of DefaultSimilarityFactory
> * comments in sample configs need updated to reflect new default behavior
> * ref guide needs updated anywhere it mentions/implies that a particular
> similarity is used (or implies TF-IDF is used by default)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]