Revision: 5951
          
http://languagetool.svn.sourceforge.net/languagetool/?rev=5951&view=rev
Author:   dominikoeo
Date:     2011-11-20 23:06:55 +0000 (Sun, 20 Nov 2011)
Log Message:
-----------
[br] handle Unicode quote U+2018 in Breton tokenizer.

Modified Paths:
--------------
    
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java

Modified: 
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
===================================================================
--- 
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
        2011-11-20 22:56:16 UTC (rev 5950)
+++ 
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
        2011-11-20 23:06:55 UTC (rev 5951)
@@ -51,8 +51,8 @@
 
     // FIXME: this is a bit of a hacky way to tokenize.  It should work
     // but I should work on a more elegant way.
-    String replaced = text.replaceAll("([Cc])['’]([Hh])", "$1##BR_APOS##$2")
-                          .replaceAll("(\\p{L})['’]", "$1##BR_APOS## ");
+    String replaced = text.replaceAll("([Cc])['’‘]([Hh])", "$1##BR_APOS##$2")
+                          .replaceAll("(\\p{L})['’‘]", "$1##BR_APOS## ");
 
     final List<String> tokenList = super.tokenize(replaced);
     List<String> tokens = new ArrayList<String>();

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Languagetool-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-cvs

Reply via email to