[jira] Commented: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

Geoffrey Young (JIRA) Tue, 01 Jul 2008 07:35:06 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609607#action_12609607
 ]


Geoffrey Young commented on SOLR-606:
-------------------------------------

sure :)

the choice of keywords is intentional.  I don't want word suggestions but 
rather phrase suggestions.

I'm searching almost exclusively over proper names - band names ("celine 
dion"), event names ("wicked: a new musical"), venue names ("staples center"), 
etc.

in my case, it does me zero good to suggest a phrase that doesn't exist, even 
if the word parts do exist independently in my data.

for example...

  o "hannah montana" is an "artist"
  o a user mis-types "hanna montanna"
  o spellchecker thinks "hanna"  is spelled correctly (based on the presence of 
"Jake Hanna" among other artists), and suggests "montana" (based on "Montana 
Rangers", etc)
  o spellchecker gives me "hanna montana" as a suggestion... which then also 
misses since it doesn't exist (and the stemmer doesn't seem to catch the 
trailing 'h', but even if it did, there are other examples I can give)

not surprisingly, using keywords instead of raw tokens for the dictionary gives 
me back only "things" that have exact matches, like "hannah montana", or 
"aerosmith" for "arrow smith", "boston red sox" for "boston red socks", etc.

I know I'm not doing what most people are interested in, but it's very 
important for us to match phrases instead of raw words due to the crazy kinds 
of ways bands name themselves.

fwiw,  I found this bug as I was playing around with the new component - for 
the reasons mentioned above I'm not at all interested in the collation feature, 
so I don't consider this a priority for me.  others may stumble upon it, 
though, which is why I reported it.

HTH, and thanks for working out the spelling component in general - it's most 
excellent.



> spellcheck.colate doesn't handle multiple tokens properly
> ---------------------------------------------------------
>
>                 Key: SOLR-606
>                 URL: https://issues.apache.org/jira/browse/SOLR-606
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 1.3
>         Environment: tomcat
>            Reporter: Geoffrey Young
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>       "q":"redbull air show"}},
>   "spellcheck":{
>    "suggestions":[
>       "redbull",[
>        "suggestion",["redbelly"]],
>       "show",[
>        "suggestion",["shot"]],
>       "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>       "q":"redbull air show",
>       "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>       "redbull air show",[
>        "suggestion",["redbull singers"]],
>       "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

Reply via email to