[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

James Dyer (JIRA) Tue, 11 Jul 2017 10:52:20 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082642#comment-16082642
 ]


James Dyer commented on SOLR-10263:
-----------------------------------

Agreed, "maxCollationTries" is expensive, especially if it takes in the tens of 
tries to find the user one or more good collations.  The worst scenario, when 
we spend a lot of time trying possibilities and then still come up empty handed 
can be frustrating.  On the other hand, spell-check should hopefully only come 
into play for a small % of queries.  In those cases, we hope the user is 
willing to wait a bit longer to both correct spelling and to give results.  
This is where it can pay dividends to have a simpler query that returns quickly 
in any case:  in the spellcheck scenario it may have to run multiple times so a 
faster query could result in a much faster spell check.

I am not convinced a typical solr user would know in advance -- globally -- 
whether or not a user is more likely to misspell words that happen to have the 
misspelling in the index when it is a word-break misspelling or otherwise.  I 
sincerely doubt even you can tell for sure if a user is more likely to hit on a 
word already in the index when they accidently combine words versus when they 
accidently break words.  My thinking is that real data is too varied for us to 
be able to set something like this in th configuration and then expect it to be 
optimal for everyone's query.

I would recommend you look into creating a synonym list for common misspelling 
for the words in your corpus.  This would be a faster and more sure way to 
handle the cases you know exist like "sun glasses > sunglasses".  The 
spellchecker would exist as a a fallback for those cases that are less common 
or you do not know about.

> Different SpellcheckComponents should have their own suggestMode
> ----------------------------------------------------------------
>
>                 Key: SOLR-10263
>                 URL: https://issues.apache.org/jira/browse/SOLR-10263
>             Project: Solr
>          Issue Type: Wish
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>            Reporter: Abhishek Kumar Singh
>            Priority: Minor
>         Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

Reply via email to