[RFC][PATCH] Add analyzed token readings to failed bad sentence test output

Silvan Jegen Sun, 10 Aug 2014 09:54:07 -0700

If a rule test fails because no error has been found in the bad example
sentence, one of the reason can be that the tokenization of the bad
sentence example does not match the expected one in the rule itself.


To identify these cases more easily, add the token readings to the
assertion message.

Signed-off-by: Silvan Jegen <s.je...@gmail.com>
---

Hi

I had difficulties when creating Japanese rules because the mecab program
I used to determine the tokenization of the example phrases produced
different tokens than the tokenization library used in languagetool.

It took me quite a while to find out why the errors in my bad example
sentences where not found. Having the tokenized readings of the bad
sentence examples in the assertion message makes it easier to identify
issues with tokenization.

I realize that this change may be less useful for languages with easier
tokenization but I still think it would be nice to discuss whether
it would make sense to include this output. Maybe there is another
functionality in languagetool, that I do not know of, that would make
the suggested changes unnecessary?

If including the analyzed token readings is useful in other assertion
messages as well, it may also be better to refactor the token reading
code into its own function and making it less ad hoc.

What do you think ?

(If you want to include the patch, I can open a pull request on Github
if you prefer)


Cheers,

Silvan

 .../org/languagetool/rules/patterns/PatternRuleTest.java  | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git 
a/languagetool-core/src/test/java/org/languagetool/rules/patterns/PatternRuleTest.java
 
b/languagetool-core/src/test/java/org/languagetool/rules/patterns/PatternRuleTest.java
index 0d5580d..d279b36 100644
--- 
a/languagetool-core/src/test/java/org/languagetool/rules/patterns/PatternRuleTest.java
+++ 
b/languagetool-core/src/test/java/org/languagetool/rules/patterns/PatternRuleTest.java
@@ -22,6 +22,7 @@ import java.io.File;
 import java.io.IOException;
 import java.io.InputStream;
 import java.lang.String;
+import java.lang.StringBuilder;
 import java.util.*;
 
 import junit.framework.TestCase;
@@ -281,9 +282,17 @@ public class PatternRuleTest extends TestCase {
       }
       
       if (!rule.isWithComplexPhrase()) {
-        assertTrue(lang + ": Did expect one error in: \"" + badSentence
-            + "\" (Rule: " + rule + "), but found " + matches.size()
-            + ". Additional info:" + rule.getMessage() + ", Matches: " + 
matches, matches.size() == 1);
+        if (matches.size() != 1) {
+         final AnalyzedSentence analyzedSentence = 
languageTool.getAnalyzedSentence(badSentence);
+         final AnalyzedTokenReadings[] analyzedTR = 
analyzedSentence.getTokens();
+         final StringBuilder sb = new StringBuilder("Analyzed token 
readings:");
+         for (AnalyzedTokenReadings atr : analyzedTR) {
+           sb.append(" " + atr.toString());
+         }
+         assertTrue(lang + ": Did expect one error in: \"" + badSentence
+               + "\" (Rule: " + rule + "), but found " + matches.size()
+               + ". Additional info:" + rule.getMessage() + ", " + 
sb.toString() + ", Matches: " + matches, matches.size() == 1);
+        }
         assertEquals(lang
                 + ": Incorrect match position markup (start) for rule " + rule 
+ ", sentence: " + badSentence,
                 expectedMatchStart, matches.get(0).getFromPos());
-- 
2.0.4

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

[RFC][PATCH] Add analyzed token readings to failed bad sentence test output

Reply via email to