Revision: 6634
http://languagetool.svn.sourceforge.net/languagetool/?rev=6634&view=rev
Author: dominikoeo
Date: 2012-03-24 12:29:14 +0000 (Sat, 24 Mar 2012)
Log Message:
-----------
- No longer consider the ellipsis (?\226?\128?\166) as a sentence separator.
It was causing false positives at least in French and Breton
as in "Mais?\226?\128?\166 c'est mon ami." (no upper case after ellipsis).
An ellipsis does not always separate sentences. Later, I will
change French and Breton to use the SRX tokenizer (but only
after the 1.7 release).
Modified Paths:
--------------
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/SentenceTokenizer.java
Modified:
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/SentenceTokenizer.java
===================================================================
---
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/SentenceTokenizer.java
2012-03-23 19:11:39 UTC (rev 6633)
+++
trunk/JLanguageTool/src/java/org/languagetool/tokenizers/SentenceTokenizer.java
2012-03-24 12:29:14 UTC (rev 6634)
@@ -38,7 +38,7 @@
// end of sentence marker:
protected static final String EOS = "\0";
//private final static String EOS = "#"; // for testing only
- protected static final String P = "[\\.!?…]"; // PUNCTUATION
+ protected static final String P = "[\\.!?]"; // PUNCTUATION
protected static final String AP = "(?:'|«|\"||\\)|\\]|\\})?"; // AFTER
PUNCTUATION
protected static final String PAP = P + AP;
protected static final String PARENS = "[\\(\\)\\[\\]]"; // parentheses
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Languagetool-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-cvs