Hi Raghuram, if you only want to split up words at these characters, create a new class derived from CL_NS(analysis)::CharTokenizer and implement bool isTokenChar(const TCHAR c). Return true, if c is a token and false if not, e.g. in case of underscore, dot, etc. But you also have to create a new analyzer (derived from CL_NS(analysis)::Analyzer) with own implementations of tokenStream() and reusableTokenStream(). Here you instantiate your tokenizer.
Kind regards, Veit 2010/4/13 n raghuramireddy <nraghuramire...@gmail.com>: > Hi > I am using Standard tokenizer for indexing the data and I want to > tokenize the words based on some delemeters like underscore, dot (.), > at the rate of symbol (@) etc. > Where i have to modify the clucene code such that standard tokenizer > would take care of tokenizing the words. > > With regards > Raghuram > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers