WhitespaceAnalyzer breaks input on spaces. Otis
--- Rupinder Singh Mazara <[EMAIL PROTECTED]> wrote: > Hi > thanks for the reply > >> my dataset also seems to have a similar problem the chemical > name > >> alpha-androstane-3, and several others exsists in the given text, > can > anyone point out what is the best stratergy > >> to employ so as to index > >> words containing - _ + to be indexed as they are and not face > being > mutilated ? > > > >You have to use or write an Analyzer that doesn't tokenize on > >non-letter or other characters. > > Are there any built in analyzers that do that ? > > >> currently on my indexes the StandardAnalyzer and QueryParser > break > >> up > >> alpha-androstane-3 > >> into TEXT:alpha -TEXT:androstane -TEXT:3 , where TEXT is the > Field > >> to be > >> searched > > > >Hm, I thought we've fixed QueryParser not to do this. Are you using > >Lucene 1.4? > no, i guess I will have to > > Rupinder > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
