Re: [CLucene-dev] CLucene tokenizer vs ICU tokenizer

Paul J. Lucas Tue, 09 Feb 2010 17:26:56 -0800

On Feb 9, 2010, at 2:36 PM, Itamar Syn-Hershko wrote:

> I'm not sure what you mean.


I mean the ability to know, for a given piece of text, where the token 
boundaries are (e.g., words).

> CLucene StandardTokenizer is meant for internal use only, and provides the
> calling Analyzer with a stream of identified tokens (it classifies the
> tokens, not just tokenizes them).

Classifies them how?  Also, one can plug in one's own tokenizer, yes?

> The ICU tokenizer is a general purpose tokenizer (like Boost's
> implementation is), with loads of extra functionality the CLucene one
> doesn't have or need.

I only care about tokenization of a sequence of characters into words.

- Paul
------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] CLucene tokenizer vs ICU tokenizer

Reply via email to