EnglishWordTokenizer vs. WordTokenizer

Daniel Naber Mon, 03 Dec 2012 13:11:11 -0800

Hi,

there are subtle differences between the generic word tokenizer and the 
English one. For example, the English one doesn't use these characters as 
delimiters:


« » — < > \r

Does anybody know a good reason for these differences? The svn changelog 
does not look like the changes made to the EnglishWordTokenizer.java are 
specific to English.

Regards
 Daniel

-- 
http://www.danielnaber.de


------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

EnglishWordTokenizer vs. WordTokenizer

Reply via email to