[ 
https://issues.apache.org/jira/browse/SOLR-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973702#comment-13973702
 ] 

Ed Smiley commented on SOLR-5995:
---------------------------------

Example:
The return values are all a single character (unicode shown in square brackets).
{code}
correction=attitude[2d]
correction=attitude[2f]
correction=attitude[2026]
{code}

These spurious characters are:
* Unicode Character 'HYPHEN-MINUS' (U+002D)
* Unicode Character 'SOLIDUS' (U+002F)
* Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026)

The multiples are usually, but not exclusively, triplets.

we can duplicate the behavior without SolrJ with the collations/ 
misspellingsAndCorrections below:, e.g.:
... 
solr/pg1/spell?q=+doc-id:(810500)+AND+attitudex&spellcheck=true&spellcheck.count=10&spellcheck.collate=true&spellcheck.collateExtendedResults=true&wt=json&qt=%2Fspell&shards.qt=%2Fspell&shards.tolerant=true.out.print
{code}
{"responseHeader":{"status":0,"QTime":60},"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]},"spellcheck":{"suggestions":["attitudex",{"numFound":6,"startOffset":21,"endOffset":30,"origFreq":0,"suggestion":[{"word":"attitudes","freq":362486},{"word":"attitu
 dex","freq":4819},{"word":"atti tudex","freq":3254},{"word":"attit 
udex","freq":159},{"word":"attitude-","freq":1080},{"word":"attituden","freq":261}]},"correctlySpelled",false,"collation",["collationQuery","
 doc-id:(810500) AND 
attitude-","hits",2,"misspellingsAndCorrections",["attitudex","attitude-"]],"collation",["collationQuery","
 doc-id:(810500) AND 
attitude/","hits",2,"misspellingsAndCorrections",["attitudex","attitude/"]],"collation",["collationQuery","
 doc-id:(810500) AND 
attitude…","hits",2,"misspellingsAndCorrections",["attitudex","attitude…"]]]}}
{code}

The configuration is:
{code}
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="df">text</str>
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.dictionary">wordbreak</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>       
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>       
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>  
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>         
    name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

<lst name="spellchecker">
      <str name="name">wordbreak</str>
      <str name="classname">solr.WordBreakSolrSpellChecker</str>      
      <str name="field">text</str>
      <str name="combineWords">true</str>
      <str name="breakWords">true</str>
      <int name="maxChanges">25</int>
      <int name="minBreakLength">3</int>
</lst>

<lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">text</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.2</float>
      <int name="maxEdits">2</int>
      <int name="minPrefix">1</int>
      <int name="maxInspections">25</int>
      <int name="minQueryLength">4</int>
      <float name="maxQueryFrequency">1</float>
</lst>
{code}




> Spurious spellcheck results
> ---------------------------
>
>                 Key: SOLR-5995
>                 URL: https://issues.apache.org/jira/browse/SOLR-5995
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 4.5.1, 4.7
>         Environment: Linux/multi-shard
>            Reporter: Ed Smiley
>
> This does not happen in all cases, but behavior is consistent when it does.
> Here is a short description of the two, closely related problem cases:
> 1. Some correctly spelled words are returning as not spelled correctly, with 
> the original, correctly spelled word with a single oddball character appended 
> as multiple suggestions.  
> 2. Some incorrectly spelled words are also returning multiple suggestions 
> that are multiple copies of the same word with a single oddball character 
> appended.  Minus the oddball character this is a word that is a good 
> correction for the original misspelling.
> I will be attaching some more information to clarify the details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to