Valery, have you tried to use whitespaceTokenizer / CharTokenizer and do any further processing in a custom TokenFilter?!
simon On Thu, Aug 20, 2009 at 8:48 PM, Robert Muir<rcm...@gmail.com> wrote: > Valery, I think it all depends on how you want your search to work. > > when I say this, I mean for example: if a document only contains "C++" > do you want searches on just "C" to match or not? > > another thing I would suggest is to take a look at the capabilities of > Solr: it has some analysis stuff that might be beneficial for your > needs. > wiki page is here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > > On Thu, Aug 20, 2009 at 1:46 PM, Valery<khame...@gmail.com> wrote: >> >> Hi Robert, >> >> so, would you expect a Tokenizer to consider '/' in both cases as a separate >> Token? >> >> Personally, I see no problem if Tokenzer would do the following job: >> >> "C/C++" ==> TokenStream of { "C", "/", "C", "+", "+"} >> and come up with "C" and "C++" tokens after processing through the >> downstream tokenfilters. >> >> Similarly: >> >> "SAP R/3" ==> TokenStream of { "SAP", "R", "/", "3"} >> and getting { "SAP", "R", "/", "3", "R/3", "SAP R/3"} later. >> >> I try to follow a spirit that a token (or its lexem) usually should never be >> parsed again. One can build more complex (compound) things from the tokens. >> However, usually one never chops a lexem into smaller pieces. >> >> What do you think, Robert? >> >> regards, >> Valery >> >> -- >> View this message in context: >> http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25066762.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org