Hi, Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a tokenizer for various languages? Does it do a good job? It seems like it does, at least for languages where words are separated by spaces or punctuation, but I have only done simple tests.
Anyone have any thoughts on this? What am I missing? Does this seem like a valid approach? Thanks, Grant --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
