[ 
https://issues.apache.org/jira/browse/SOLR-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett resolved SOLR-10314.
--------------------------------------
       Resolution: Information Provided
    Fix Version/s:     (was: 6.7)
                       (was: 7.0)

I'm going to close this issue as it seems there isn't a ton that can be done 
about it - it's expected behavior for the most part, and the solution would be 
much larger than this narrow use case.

> Spellcheck with SnowballPorterFilterFactory and Synonyms doesn't work well
> --------------------------------------------------------------------------
>
>                 Key: SOLR-10314
>                 URL: https://issues.apache.org/jira/browse/SOLR-10314
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>            Reporter: Cassandra Targett
>
> As noted in SOLR-10252, the default spellcheck configuration in the 
> data_driven_schema_configs (and basic_configs) uses the {{\_text_}} field as 
> the default field for spellcheck. This field is {{text_general}} field type.
> If I use this default configuration for spellcheck, but modify the 
> {{text_general}} field to use the SnowballPorterFilterFactory (with 
> language=German in this case), and have synonyms in my analysis chain, 
> queries to the {{/spell}} request handler will fail when there are 2 or more 
> terms which are both preceded with a {{+}} operator. 
> Note that the default spellcheck configuration also enables 
> spellcheck.collate - if I disable that, I do not get any error. I also do not 
> get an error if I use only 1 term, even if it is spelled "correctly". If at 
> least one of the terms is spelled incorrectly, that also does not give an 
> error.
> So, in summary, there's a pretty specific list of variables at work here:
> # {{/spell}} request handler
> # 2 or more terms, both spelled correctly (or, both terms exist in the index)
> # all terms required with {{+}}
> # synonyms (there is a big list in this case, which I cannot share...see 
> SOLR-10252 for an example of the parsed query to see how big the list can get)
> # SnowballPorterFilter
> # spellcheck.collate=true
> The error returned is: 
> {code}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://localhost:7574/solr/spelltest3_shard1_replica2: String 
> index out of range: -1
> {code}
> I made several experiments and found that if synonyms are removed from the 
> field type (and thus the query analysis chain), the query is successful with 
> collations enabled. So it's not SnowballPorterFilter by itself, but with 
> {{+}} and synonyms and collation.
> The field type definition is:
> {code}
>   <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <filter class="solr.SnowballPorterFilterFactory" language="German"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <filter class="solr.SynonymFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.SnowballPorterFilterFactory" language="German"/>
>     </analyzer>
>   </fieldType>
> {code}
> This problem was found with 5.5.2, but I verified it still exists in 6.4 and 
> 6.5.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to