Thank you for your prompt reply this makes perfect sense.
Le 07/06/2016 17:24, Robert Muir a écrit :
Its just a heuristic: that it does not allow 2 edits
(insertion/deletion/substitution/transposition) to the word if the
first character differs
(https://github.com/apache/lucene-solr/blob/master/lucene/suggest/src/java/org/apache/lucene/search/spell/DirectSpellChecker.java#L411).
So when it goes back for n=2, it requires the first character to match.
At least at the time the thing was written, this has a very large
impact on performance, because otherwise too much of the term
dictionary must be inspected and its much slower. The idea is, it
won't hurt too much on quality, for the same reasons that many of
these string distance functions incorporate a bias towards the
matching prefix (e.g. jaro winkler).
On Tue, Jun 7, 2016 at 5:20 AM, Caroline Collet
<caroline.col...@pertimm.com <mailto:caroline.col...@pertimm.com>> wrote:
Hello,
I have a very strange behavior when I use the DirectSpellChecker
of Lucene. I have set the prefixLength to 0. I have indexed only
one item with one field : brand=samsung.
I have tried to make requests with spelling mistakes inside.
When I search for "smsng" I obtain "samsung" which is logical
since I only have 2 corrections to make to obtain "samsung"
When I search for "amsung" I obtain "samsung" since I have set the
prefixLenght to 0
But when I search "amung" which only has 2 errors, I do not obtain
"samsung", I obtain nothing.
I don't understand this behaviour, it is like no other correction
is permitted if the first letter is misspelled.
Did I miss some parameters of the spellchecker that could explain
this behavior?
I precise that I use :
- Lucene 5.5.0
- JRE 1.8
Thank you in advance for taking time to answer my question,
Bests regards,
--
PERTIMM <http://www.pertimm.com/fr/>
Caroline Collet
Ingénieur développement
Tel : +33 (0)1 80 04 82 89 <tel:%2B33%20%280%291%2080%2004%2082%2089>
caroline.col...@pertimm.com <mailto:caroline.col...@pertimm.com>
http://www.pertimm.com/fr/
Pertimm
51, boulevard Voltaire
92600 Asnières-Sur-Seine, France
--
PERTIMM <http://www.pertimm.com/fr/>
Caroline Collet
Ingénieur développement
Tel : +33 (0)1 80 04 82 89
caroline.col...@pertimm.com <mailto:caroline.col...@pertimm.com>
http://www.pertimm.com/fr/
Pertimm
51, boulevard Voltaire
92600 Asnières-Sur-Seine, France