Jack Krupansky created SOLR-4277:
------------------------------------

             Summary: Spellchecker sometimes falsely reports a spelling error 
and correction
                 Key: SOLR-4277
                 URL: https://issues.apache.org/jira/browse/SOLR-4277
             Project: Solr
          Issue Type: Bug
          Components: SearchComponents - other
    Affects Versions: 4.0
            Reporter: Jack Krupansky


In some cases, the Solr spell checker improperly reports query terms as being 
misspelled.

Using the Solr example for 4.0, I added these mini documents:

{code}
curl http://localhost:8983/solr/update?commit=true -H 
'Content-type:application/csv' -d '
id,name
spel-1,aardvark abacus ball bill cat cello
spel-2,abate accord band bell cattle check
spel-3,adorn border clean clock'
{code}

I then issued this request:

{code}
curl "http://localhost:8983/solr/spell/?q=check&indent=true";
{code}

The spell checker falsely concluded that "check" was misspelled and improperly 
corrected it to "clock":

{code}
<lst name="spellcheck">
  <lst name="suggestions">
    <lst name="check">
      <int name="numFound">1</int>
      <int name="startOffset">0</int>
      <int name="endOffset">5</int>
      <int name="origFreq">1</int>
      <arr name="suggestion">
        <lst>
          <str name="word">clock</str>
          <int name="freq">1</int>
        </lst>
      </arr>
    </lst>
    <bool name="correctlySpelled">false</bool>
    <lst name="collation">
      <str name="collationQuery">clock</str>
      <int name="hits">1</int>
      <lst name="misspellingsAndCorrections">
        <str name="check">clock</str>
      </lst>
    </lst>
  </lst>
</lst>
{code}

And if I query for "clock", it gets corrected to "check"!

{code}
curl "http://localhost:8983/solr/spell/?q=clock&indent=true";
{code}

{code}
  <lst name="suggestions">
    <lst name="clock">
      <int name="numFound">1</int>
      <int name="startOffset">0</int>
      <int name="endOffset">5</int>
      <int name="origFreq">1</int>
      <arr name="suggestion">
        <lst>
          <str name="word">check</str>
          <int name="freq">1</int>
        </lst>
      </arr>
    </lst>
    <bool name="correctlySpelled">false</bool>
    <lst name="collation">
      <str name="collationQuery">check</str>
      <int name="hits">1</int>
      <lst name="misspellingsAndCorrections">
        <str name="clock">check</str>
      </lst>
    </lst>
  </lst>
{code}

Note: This appears to be only because "clock" is so close to "check". With 
other terms I don't see the problem:

{code}
curl "http://localhost:8983/solr/spell/?q=cattle+abate+check&indent=true";
{code}

{code}
  <lst name="suggestions">
    <lst name="check">
      <int name="numFound">1</int>
      <int name="startOffset">13</int>
      <int name="endOffset">18</int>
      <int name="origFreq">1</int>
      <arr name="suggestion">
        <lst>
          <str name="word">clock</str>
          <int name="freq">1</int>
        </lst>
      </arr>
    </lst>
    <bool name="correctlySpelled">false</bool>
    <lst name="collation">
      <str name="collationQuery">cattle abate clock</str>
      <int name="hits">2</int>
      <lst name="misspellingsAndCorrections">
        <str name="cattle">cattle</str>
        <str name="abate">abate</str>
        <str name="check">clock</str>
      </lst>
    </lst>
  </lst>
{code}

Although, it inappropriately lists "cattle" and "abate" in the "misspellings" 
section even though no suggestions were offered.

Finally, I can workaround this issue by removing the following line from 
solrconfig.xml:

{code}
      <str name="spellcheck.alternativeTermCount">5</str>
{code}

Which responds to the previous request with:

{code}
  <lst name="suggestions">
    <bool name="correctlySpelled">false</bool>
  </lst>
{code}

Which makes the original problem go away. Although, it does beg the question as 
to why my 100% correct query is still tagged as "correctlySpelled" = "false", 
but that's a separate Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to