Answering my own question, I think it is b/c Tokenizer's work with a Reader and you 
would have to read in the whole document in order to use the BreakIterator, which 
operates on a String...

>>> [EMAIL PROTECTED] 07/20/04 03:23PM >>>
Hi,

Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a 
tokenizer for various languages?  Does it do a good job?  It seems like it does, at 
least for languages where words are separated by spaces or punctuation, but I have 
only done simple tests.

Anyone have any thoughts on this?  What am I missing?  Does this seem like a valid 
approach?

Thanks,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to