[
https://issues.apache.org/jira/browse/SOLR-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cassandra Targett resolved SOLR-10314.
--------------------------------------
Resolution: Information Provided
Fix Version/s: (was: 6.7)
(was: 7.0)
I'm going to close this issue as it seems there isn't a ton that can be done
about it - it's expected behavior for the most part, and the solution would be
much larger than this narrow use case.
> Spellcheck with SnowballPorterFilterFactory and Synonyms doesn't work well
> --------------------------------------------------------------------------
>
> Key: SOLR-10314
> URL: https://issues.apache.org/jira/browse/SOLR-10314
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: spellchecker
> Reporter: Cassandra Targett
>
> As noted in SOLR-10252, the default spellcheck configuration in the
> data_driven_schema_configs (and basic_configs) uses the {{\_text_}} field as
> the default field for spellcheck. This field is {{text_general}} field type.
> If I use this default configuration for spellcheck, but modify the
> {{text_general}} field to use the SnowballPorterFilterFactory (with
> language=German in this case), and have synonyms in my analysis chain,
> queries to the {{/spell}} request handler will fail when there are 2 or more
> terms which are both preceded with a {{+}} operator.
> Note that the default spellcheck configuration also enables
> spellcheck.collate - if I disable that, I do not get any error. I also do not
> get an error if I use only 1 term, even if it is spelled "correctly". If at
> least one of the terms is spelled incorrectly, that also does not give an
> error.
> So, in summary, there's a pretty specific list of variables at work here:
> # {{/spell}} request handler
> # 2 or more terms, both spelled correctly (or, both terms exist in the index)
> # all terms required with {{+}}
> # synonyms (there is a big list in this case, which I cannot share...see
> SOLR-10252 for an example of the parsed query to see how big the list can get)
> # SnowballPorterFilter
> # spellcheck.collate=true
> The error returned is:
> {code}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:7574/solr/spelltest3_shard1_replica2: String
> index out of range: -1
> {code}
> I made several experiments and found that if synonyms are removed from the
> field type (and thus the query analysis chain), the query is successful with
> collations enabled. So it's not SnowballPorterFilter by itself, but with
> {{+}} and synonyms and collation.
> The field type definition is:
> {code}
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
> <filter class="solr.SnowballPorterFilterFactory" language="German"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
> <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="German"/>
> </analyzer>
> </fieldType>
> {code}
> This problem was found with 5.5.2, but I verified it still exists in 6.4 and
> 6.5.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]