Thank you for your prompt reply this makes perfect sense.

Le 07/06/2016 17:24, Robert Muir a écrit :
Its just a heuristic: that it does not allow 2 edits (insertion/deletion/substitution/transposition) to the word if the first character differs (https://github.com/apache/lucene-solr/blob/master/lucene/suggest/src/java/org/apache/lucene/search/spell/DirectSpellChecker.java#L411). So when it goes back for n=2, it requires the first character to match.

At least at the time the thing was written, this has a very large impact on performance, because otherwise too much of the term dictionary must be inspected and its much slower. The idea is, it won't hurt too much on quality, for the same reasons that many of these string distance functions incorporate a bias towards the matching prefix (e.g. jaro winkler).


On Tue, Jun 7, 2016 at 5:20 AM, Caroline Collet <caroline.col...@pertimm.com <mailto:caroline.col...@pertimm.com>> wrote:

    Hello,

    I have a very strange behavior when I use the DirectSpellChecker
    of Lucene. I have set the prefixLength to 0. I have indexed only
    one item with one field : brand=samsung.
    I have tried to make requests with spelling mistakes inside.

    When I search for "smsng" I obtain "samsung" which is logical
    since I only have 2 corrections to make to obtain "samsung"
    When I search for "amsung" I obtain "samsung" since I have set the
    prefixLenght to 0
    But when I search "amung" which only has 2 errors, I do not obtain
    "samsung", I obtain nothing.

    I don't understand this behaviour, it is like no other correction
    is permitted if the first letter is misspelled.

    Did I miss some parameters of the spellchecker that could explain
    this behavior?

    I precise that I use :
    - Lucene 5.5.0
    - JRE 1.8

    Thank you in advance for taking time to answer my question,
    Bests regards,
-- PERTIMM <http://www.pertimm.com/fr/>

    Caroline Collet
    Ingénieur développement

    Tel : +33 (0)1 80 04 82 89 <tel:%2B33%20%280%291%2080%2004%2082%2089>
    caroline.col...@pertimm.com <mailto:caroline.col...@pertimm.com>
    http://www.pertimm.com/fr/

        

    Pertimm
    51, boulevard Voltaire
    92600 Asnières-Sur-Seine, France




--
PERTIMM <http://www.pertimm.com/fr/>      

Caroline Collet
Ingénieur développement

Tel : +33 (0)1 80 04 82 89
caroline.col...@pertimm.com <mailto:caroline.col...@pertimm.com>
http://www.pertimm.com/fr/

        

Pertimm
51, boulevard Voltaire
92600 Asnières-Sur-Seine, France



Reply via email to