Any reason that you are not using the DirectSpellChecker?
See:
http://lucene.apache.org/core/4_0_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html
-- Jack Krupansky
-----Original Message-----
From: Samuel García Martínez
Sent: Wednesday, February 20, 2013 3:34 PM
To: java-user@lucene.apache.org
Subject: possible bug on Spellchecker
Hi all,
Debugging Solr spellchecker (IndexBasedSpellchecker, delegating on lucene
Spellchecker) behaviour i think i found a bug when the input is a 6 letter
word:
- george
- anthem
- argued
- fluent
Due to the getMin() and getMax() the grams indexed for these terms are 3
and 4. So, the fields would be something like this:
- for "*george*"
- start3: "geo"
- start4: "geor"
- end3: "rge"
- end4: "orge"
- 3: "geo", "eor", "org", "rge"
- 4: "geor", "eorg", "orge"
- for "*anthem*"
- start3: "ant"
- start4: "anth"
- end3: "tem"
- end4: "them"
The problem shows up when the user swap 3rd a 4th characters, misspelling
the word like this:
- geroge
- anhtem
The queries generated for this terms are: (SHOULD boolean queries)
- for "*geroge*"
- start3: "ger"
- start4: "gero"
- end3: "oge"
- end4: "roge"
- 3: "ger", "ero", "rog", "oge"
- 4: "gero", "erog", "roge"
- for "*anhtem*"
- start3: "anh"
- start4: "anht"
- end3: "tem"
- end4: "htem"
- 3: "anh", "nht", "hte", "tem"
- 4: "anht", "nhte", "htem"
So, as you can see, this kind of misspelling never matches the suitable
suggestions although the edit distance is 0.95555556.
I think getMin(int l) and getMax(int l) should return 2 and 3,
respectively, for l==6. Debugging other values i did not found any problem
with any kind of misspelling.
Any thoughts about this?
--
Un saludo,
Samuel García
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org