On Feb 9, 2010, at 2:36 PM, Itamar Syn-Hershko wrote: > I'm not sure what you mean.
I mean the ability to know, for a given piece of text, where the token boundaries are (e.g., words). > CLucene StandardTokenizer is meant for internal use only, and provides the > calling Analyzer with a stream of identified tokens (it classifies the > tokens, not just tokenizes them). Classifies them how? Also, one can plug in one's own tokenizer, yes? > The ICU tokenizer is a general purpose tokenizer (like Boost's > implementation is), with loads of extra functionality the CLucene one > doesn't have or need. I only care about tokenization of a sequence of characters into words. - Paul ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers
