Revision: 7858 http://languagetool.svn.sourceforge.net/languagetool/?rev=7858&view=rev Author: dominikoeo Date: 2012-08-13 19:52:26 +0000 (Mon, 13 Aug 2012) Log Message: ----------- - The word tokenizer now considers the pipe | and backtick ` characters as word separators. I intended to make the star * as word separator too but it would currently break 2 rules: German rule LEERZEICHEN_RECHENZEICHEN and Italian rule GR_09.
Modified Paths: -------------- trunk/JLanguageTool/CHANGES.txt trunk/JLanguageTool/src/java/org/languagetool/tokenizers/WordTokenizer.java Modified: trunk/JLanguageTool/CHANGES.txt =================================================================== --- trunk/JLanguageTool/CHANGES.txt 2012-08-13 18:27:55 UTC (rev 7857) +++ trunk/JLanguageTool/CHANGES.txt 2012-08-13 19:52:26 UTC (rev 7858) @@ -66,6 +66,8 @@ -HTTP API: the XML output has been extended to include the category of the match + -The word tokenizer now considers the following characters as word separator: | (pipe) + and` (backtick). 1.8 (2012-06-30) Modified: trunk/JLanguageTool/src/java/org/languagetool/tokenizers/WordTokenizer.java =================================================================== --- trunk/JLanguageTool/src/java/org/languagetool/tokenizers/WordTokenizer.java 2012-08-13 18:27:55 UTC (rev 7857) +++ trunk/JLanguageTool/src/java/org/languagetool/tokenizers/WordTokenizer.java 2012-08-13 19:52:26 UTC (rev 7858) @@ -37,7 +37,7 @@ public List<String> tokenize(final String text) { final List<String> l = new ArrayList<String>(); final StringTokenizer st = new StringTokenizer(text, - "\u0020\u00A0\u115f\u1160\u1680" + "\u0020\u0060\u007c\u00A0\u115f\u1160\u1680" + "\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007" + "\u2008\u2009\u200A\u200B\u200c\u200d\u200e\u200f" + "\u2028\u2029\u202a\u202b\u202c\u202d\u202e\u202f" This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-cvs mailing list Languagetool-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-cvs