Here it is https://issues.apache.org/jira/browse/LUCENE-4793 :)
On Thu, Feb 21, 2013 at 9:02 PM, Samuel García Martínez < samuelgmarti...@gmail.com> wrote: > Yes, of course i can. I'll try to open it this night (European Time) or > tomorrow as soon as I get to the office. > > > On Thu, Feb 21, 2013 at 4:14 PM, Dyer, James <james.d...@ingramcontent.com > > wrote: > >> Samuel, >> >> Do you think you could write a failing unit test and open a JIRA issue? >> Or at the least open a JIRA issue with all the details without a test? >> >> James Dyer >> Ingram Content Group >> (615) 213-4311 >> >> >> -----Original Message----- >> From: Samuel García Martínez [mailto:samuelgmarti...@gmail.com] >> Sent: Thursday, February 21, 2013 2:33 AM >> To: java-user@lucene.apache.org >> Subject: Re: possible bug on Spellchecker >> Importance: Low >> >> I'm using Solr 3.6 and DirectSpellchecker is available only on v4+. >> Moreover, in "big" indexes i prefer using sidekick index rather than >> iterating over term dictionary. >> >> >> On Thu, Feb 21, 2013 at 8:19 AM, Jack Krupansky <j...@basetechnology.com >> >wrote: >> >> > Any reason that you are not using the DirectSpellChecker? >> > >> > See: >> > http://lucene.apache.org/core/**4_0_0/suggest/org/apache/** >> > lucene/search/spell/**DirectSpellChecker.html< >> http://lucene.apache.org/core/4_0_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html >> > >> > >> > -- Jack Krupansky >> > >> > -----Original Message----- From: Samuel García Martínez >> > Sent: Wednesday, February 20, 2013 3:34 PM >> > To: java-user@lucene.apache.org >> > Subject: possible bug on Spellchecker >> > >> > >> > Hi all, >> > >> > Debugging Solr spellchecker (IndexBasedSpellchecker, delegating on >> lucene >> > Spellchecker) behaviour i think i found a bug when the input is a 6 >> letter >> > word: >> > - george >> > - anthem >> > - argued >> > - fluent >> > >> > Due to the getMin() and getMax() the grams indexed for these terms are 3 >> > and 4. So, the fields would be something like this: >> > - for "*george*" >> > >> > - start3: "geo" >> > - start4: "geor" >> > - end3: "rge" >> > - end4: "orge" >> > - 3: "geo", "eor", "org", "rge" >> > - 4: "geor", "eorg", "orge" >> > - for "*anthem*" >> > >> > - start3: "ant" >> > - start4: "anth" >> > - end3: "tem" >> > - end4: "them" >> > >> > The problem shows up when the user swap 3rd a 4th characters, >> misspelling >> > the word like this: >> > - geroge >> > - anhtem >> > >> > The queries generated for this terms are: (SHOULD boolean queries) >> > - for "*geroge*" >> > >> > - start3: "ger" >> > - start4: "gero" >> > - end3: "oge" >> > - end4: "roge" >> > - 3: "ger", "ero", "rog", "oge" >> > - 4: "gero", "erog", "roge" >> > - for "*anhtem*" >> > >> > - start3: "anh" >> > - start4: "anht" >> > - end3: "tem" >> > - end4: "htem" >> > - 3: "anh", "nht", "hte", "tem" >> > - 4: "anht", "nhte", "htem" >> > >> > So, as you can see, this kind of misspelling never matches the suitable >> > suggestions although the edit distance is 0.95555556. >> > >> > I think getMin(int l) and getMax(int l) should return 2 and 3, >> > respectively, for l==6. Debugging other values i did not found any >> problem >> > with any kind of misspelling. >> > >> > Any thoughts about this? >> > >> > -- >> > Un saludo, >> > Samuel García >> > >> > >> ------------------------------**------------------------------**--------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org< >> java-user-unsubscr...@lucene.apache.org> >> > For additional commands, e-mail: java-user-help@lucene.apache.**org< >> java-user-h...@lucene.apache.org> >> > >> > >> >> >> -- >> Un saludo, >> Samuel García. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > Un saludo, > Samuel García. > -- Un saludo, Samuel García.