Revision: 5951
http://languagetool.svn.sourceforge.net/languagetool/?rev=5951&view=rev
Author: dominikoeo
Date: 2011-11-20 23:06:55 +0000 (Sun, 20 Nov 2011)
Log Message:
-----------
[br] handle Unicode quote U+2018 in Breton tokenizer.
Modified Paths:
--------------
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
Modified:
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
===================================================================
---
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
2011-11-20 22:56:16 UTC (rev 5950)
+++
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/br/BretonWordTokenizer.java
2011-11-20 23:06:55 UTC (rev 5951)
@@ -51,8 +51,8 @@
// FIXME: this is a bit of a hacky way to tokenize. It should work
// but I should work on a more elegant way.
- String replaced = text.replaceAll("([Cc])['’]([Hh])", "$1##BR_APOS##$2")
- .replaceAll("(\\p{L})['’]", "$1##BR_APOS## ");
+ String replaced = text.replaceAll("([Cc])['’‘]([Hh])", "$1##BR_APOS##$2")
+ .replaceAll("(\\p{L})['’‘]", "$1##BR_APOS## ");
final List<String> tokenList = super.tokenize(replaced);
List<String> tokens = new ArrayList<String>();
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Languagetool-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-cvs