Answering my own question, I think it is b/c Tokenizer's work with a Reader and you would have to read in the whole document in order to use the BreakIterator, which operates on a String...
>>> [EMAIL PROTECTED] 07/20/04 03:23PM >>> Hi, Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a tokenizer for various languages? Does it do a good job? It seems like it does, at least for languages where words are separated by spaces or punctuation, but I have only done simple tests. Anyone have any thoughts on this? What am I missing? Does this seem like a valid approach? Thanks, Grant --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
