Re: [CLucene-dev] Tokenizing the words based on some special delemeters

Veit Jahns Wed, 14 Apr 2010 06:28:58 -0700

Hi Raghuram,

if you only want to split up words at these characters, create a new
class derived from CL_NS(analysis)::CharTokenizer and implement bool
isTokenChar(const TCHAR c). Return true, if c is a token and false if
not, e.g. in case of underscore, dot, etc. But you also have to create
a new analyzer (derived from CL_NS(analysis)::Analyzer) with own
implementations of tokenStream() and reusableTokenStream(). Here you
instantiate your tokenizer.


Kind regards,

Veit

2010/4/13 n raghuramireddy <nraghuramire...@gmail.com>:
> Hi
>  I am using Standard tokenizer for indexing the data and I want to
> tokenize the words based on some delemeters like underscore, dot (.),
> at the rate of symbol (@) etc.
>  Where i have to modify the clucene code such that standard tokenizer
> would take care of tokenizing the words.
>
> With regards
> Raghuram
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] Tokenizing the words based on some special delemeters

Reply via email to