[ 
https://issues.apache.org/jira/browse/SOLR-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960976#comment-15960976
 ] 

James Dyer commented on SOLR-10314:
-----------------------------------

SpellingQueryConverter does not generally work with stemmed fields.  This is 
pluggable and users can write they own class if they need custom behavior.  
They also can use "spellcheck.q" with the raw keywords they wish to have 
checked and bypass the Query Converter entirely.  Users complain about getting 
esoteric error messages from this class periodically and it probably should 
give a more thoughtful log message in these cases.  It would be nice if we 
could re-think how spell checking works and possibly dispense with the need for 
a QueryConverter, but personally I have yet to think of a good solution so far.

> Spellcheck with SnowballPorterFilterFactory and Synonyms doesn't work well
> --------------------------------------------------------------------------
>
>                 Key: SOLR-10314
>                 URL: https://issues.apache.org/jira/browse/SOLR-10314
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>            Reporter: Cassandra Targett
>             Fix For: 5.5, 6.4
>
>
> As noted in SOLR-10252, the default spellcheck configuration in the 
> data_driven_schema_configs (and basic_configs) uses the {{\_text_}} field as 
> the default field for spellcheck. This field is {{text_general}} field type.
> If I use this default configuration for spellcheck, but modify the 
> {{text_general}} field to use the SnowballPorterFilterFactory (with 
> language=German in this case), and have synonyms in my analysis chain, 
> queries to the {{/spell}} request handler will fail when there are 2 or more 
> terms which are both preceded with a {{+}} operator. 
> Note that the default spellcheck configuration also enables 
> spellcheck.collate - if I disable that, I do not get any error. I also do not 
> get an error if I use only 1 term, even if it is spelled "correctly". If at 
> least one of the terms is spelled incorrectly, that also does not give an 
> error.
> So, in summary, there's a pretty specific list of variables at work here:
> # {{/spell}} request handler
> # 2 or more terms, both spelled correctly (or, both terms exist in the index)
> # all terms required with {{+}}
> # synonyms (there is a big list in this case, which I cannot share...see 
> SOLR-10252 for an example of the parsed query to see how big the list can get)
> # SnowballPorterFilter
> # spellcheck.collate=true
> The error returned is: 
> {code}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://localhost:7574/solr/spelltest3_shard1_replica2: String 
> index out of range: -1
> {code}
> I made several experiments and found that if synonyms are removed from the 
> field type (and thus the query analysis chain), the query is successful with 
> collations enabled. So it's not SnowballPorterFilter by itself, but with 
> {{+}} and synonyms and collation.
> The field type definition is:
> {code}
>   <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <filter class="solr.SnowballPorterFilterFactory" language="German"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <filter class="solr.SynonymFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.SnowballPorterFilterFactory" language="German"/>
>     </analyzer>
>   </fieldType>
> {code}
> This problem was found with 5.5.2, but I verified it still exists in 6.4 and 
> 6.5.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to