Which analyzer in particular are you using? Its probably not doing what you want for hindi. These "diacritics" are important (vowels, etc).
On Fri, Jul 10, 2009 at 3:10 PM, OBender<osya_ben...@hotmail.com> wrote: > Hi All, > > > > I'm using the default setup of lucene (no custom analyzers configured) and > came across the following issue: > > In Hindi if there is a letter with a diacritic in a phrase lucene will find > the phrase with this letter even if the search string is for the letter > without a diacritics. > > Is this an expected behavior? Maybe this is standard for all languages with > letters that have diacritics? > > > > From pure byte standpoint I can see the logic, the letter with diacritics > takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes 3 (E0 A4 95) > so if I search for *some_letter* where some letter has code (E0 A4 95) > lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter. > > > > Any comments much appreciated. > > > > Thanks. > > > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org