Actually what I wanted to try is to add a check in the grammar rule that takes 
into account also the person in which the verb is used so that we check the 
tense of verbs only if they are in the same person.

Therefore, instead of adding the exception as Dominique was suggesting (that 
btw is something that we could try), the rule would match 

raccolse[raccogliere/VER:ind+past+3+s]


with


viaggio[viaggio/NOUN-M:s,viaggiare/VER:ind+pres+1+s]


because they are not in the same person. I assume that if someone was to make a 
verb tense mistake in writing a sentence they would, at least, use the same 
person. 

Any suggestion on how this could be achieved? I guess that it would be much 
easier in Java than with regexps

Thanks

Paolo


________________________________
 From: Dominique Pellé <dominique.pe...@gmail.com>
To: development discussion for LanguageTool 
<languagetool-devel@lists.sourceforge.net> 
Sent: Thursday, December 27, 2012 11:24 PM
Subject: Re: Italian Language enhancements
 

Mauro Condarelli <mc5...@mclink.it> wrote:


Hi,
>I'm trying to use LT for Italian.
>There are a lot of false positives in my language, so I started to
    look around to enhance the rules.
>
>I found out many false-positives come from incorrect tagging (to be
    more precise: lack of disambiguation), so I tried to implement some
    very simple disambiguation.
>Unfortunately it doesn't seem to work. At end of message you find my
    changes.
>
>My first test sentence is:
>"Prima di lasciarsi il tempo di pensare troppo raccolse zaino e
    bastone da viaggio e, con un lungo passo determinato, attraversò la
    soglia."
>
>Test results are:
>" Starting check in Italian...
>
>1. Line 1, column 47
>Message: Controllare il tempo dei verbi utilizzati nella frase. (deactivate)
>Context: ...di lasciarsi il tempo di pensare troppo raccolse zaino e bastone 
>da viaggioe, con un lungo passo determinato, attr...
>
>Potential problems found: 1 (time: 25ms)"
>
>Which is absolutely wrong because the highlighted part contains just
    one verb ("raccolse").
>
>Tagging gives:
>" <S> Prima[primo/ADJ:pos+f+s, 
>prima/ADV]di[di/PRE]lasciarsi[lasciare/VER:inf+pres+si]il[il/ART-M:s]tempo[tempo/NOUN-M:s]di[di/PRE]pensare[pensare/VER:inf+pres]troppo[troppo/ADV,
> troppo/ADJ:pos+m+s, 
>troppo/DET-INDEF:m+s]raccolse[raccogliere/VER:ind+past+3+s]zaino[zaino/NOUN-M:s]e[e/CON]bastone[bastone/NOUN-M:s]da[da/PRE]viaggio[viaggio/NOUN-M:s,
> 
>viaggiare/VER:ind+pres+1+s]e[e/CON],[,/PON]con[con/PRE]un[un/ART-M:s]lungo[lungo/ADJ:pos+m+s,
> lungo/PRE]passo[passo/NOUN-M:s, passo/ADJ:pos+m+s, 
>passare/VER:ind+pres+1+s]determinato[determinato/ADJ:pos+m+s, 
>determinare/VER:part+past+s+m],[,/PON]attraversò[attraversare/VER:ind+past+3+s]la[la/PRO-PERS-CLI-3-F-S,
> la/ART-F:s]soglia[soglia/NOUN-F:s, solere/VER:cond+pres+2+s, 
>solere/VER:cond+pres+1+s, solere/VER:cond+pres+3+s].[./SENT, </S>] "
>
>There's an ambiguity in the word "viaggio" which, taken alone, can
    be either a noun ("trip", the correct meaning in this case) or a
    verb ("I travel"), as correctly stated by tagging.
>I assume this is the reason for the false positive; can someone
    confirm, please?
>
>I thus tried to avoid this particular error by adding the
    disambiguating rules below.
>What I wanted to say is: "PREposition or ARTicle cannot immediately
    preceded a VERb".
>
>Obviously I goofed somewhere because it didn't work (the above
    results are *with* the changes).
>
>Can someone help me, please?
>TiA
>Mauro
>
>
>Index: src/main/java/org/languagetool/language/Italian.java
>===================================================================
>--- src/main/java/org/languagetool/language/Italian.java   
    (revision 8680)
>+++ src/main/java/org/languagetool/language/Italian.java    (working
    copy)
>@@ -32,11 +32,14 @@
> import org.languagetool.rules.WordRepeatRule;
> import org.languagetool.rules.it.MorfologikItalianSpellerRule;
> import org.languagetool.tagging.Tagger;
>+import org.languagetool.tagging.disambiguation.Disambiguator;
>+import
org.languagetool.tagging.disambiguation.rules.it.ItalianRuleDisambiguator;
> import org.languagetool.tagging.it.ItalianTagger;
> 
> public class Italian extends Language {
> 
>   private Tagger tagger;
>+  private Disambiguator disambiguator;
> 
>   @Override
>   public Locale getLocale() {
>@@ -77,6 +80,14 @@
>   }
> 
>   @Override
>+  public final Disambiguator getDisambiguator() {
>+    if (disambiguator == null) {
>+      disambiguator = new ItalianRuleDisambiguator();
>+    }
>+    return disambiguator;
>+  }
>+
>+  @Override
>   public Contributor[] getMaintainers() {
>     final Contributor contributor = new Contributor("Paolo
    Bianchini");
>     return new Contributor[] { contributor };
>Index:
src/main/java/org/languagetool/tagging/disambiguation/rules/it/ItalianRuleDisambiguator.java
>===================================================================
>---
    
src/main/java/org/languagetool/tagging/disambiguation/rules/it/ItalianRuleDisambiguator.java
   

    (revision 0)
>+++
    
src/main/java/org/languagetool/tagging/disambiguation/rules/it/ItalianRuleDisambiguator.java
   

    (revision 0)
>@@ -0,0 +1,32 @@
>+/* LanguageTool, a natural language style checker 
>+ * Copyright (C) 2007 Daniel Naber (http://www.danielnaber.de)
>+ * 
>+ * This library is free software; you can redistribute it and/or
>+ * modify it under the terms of the GNU Lesser General Public
>+ * License as published by the Free Software Foundation; either
>+ * version 2.1 of the License, or (at your option) any later
    version.
>+ *
>+ * This library is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU
>+ * Lesser General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU Lesser General Public
>+ * License along with this library; if not, write to the Free
    Software
>+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 
    02110-1301
>+ * USA
>+ */
>+
>+package org.languagetool.tagging.disambiguation.rules.it;
>+
>+import org.languagetool.Language;
>+import
    org.languagetool.tagging.disambiguation.rules.AbstractRuleDisambiguator;
>+
>+public class ItalianRuleDisambiguator extends
    AbstractRuleDisambiguator {
>+
>+    @Override
>+    protected Language getLanguage() {
>+        return Language.ITALIAN;
>+    }
>+
>+}
>Index:
    src/main/resources/org/languagetool/resource/it/disambiguation.xml
>===================================================================
>---
    src/main/resources/org/languagetool/resource/it/disambiguation.xml   
    (revision 0)
>+++
    src/main/resources/org/languagetool/resource/it/disambiguation.xml   
    (revision 0)
>@@ -0,0 +1,35 @@
>+<?xml version="1.0" encoding="utf-8"?>
>+<!-- Italian Disambiguation Rules for LanguageTool Copyright (C)
    2012 Mauro 
>+    Condarelli. See disambiguation.xsd for syntax. $Id: $ -->
>+<rules lang="it" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>+    xsi:noNamespaceSchemaLocation="../disambiguation.xsd">
>+    <rulegroup id="art-ver" name="ART+VER→delete">
>+        <rule>
>+            <pattern>
>+                <token postag="ART"></token>
>+                <marker>
>+                    <token postag="VER"></token>
>+                </marker>
>+            </pattern>
>+            <disambig action="remove"
    postag="VER"></disambig>
>+        </rule>
>+        <rule>
>+            <pattern>
>+                <token postag="ARTPRE"></token>
>+                <marker>
>+                    <token postag="VER"></token>
>+                </marker>
>+            </pattern>
>+            <disambig action="remove"
    postag="VER"></disambig>
>+        </rule>
>+        <rule>
>+            <pattern>
>+                <token postag="PRE"></token>
>+                <marker>
>+                    <token postag="VER"></token>
>+                </marker>
>+            </pattern>
>+            <disambig action="remove"
    postag="VER"></disambig>
>+        </rule>
>+    </rulegroup>
>+</rules>
>
>
>

Ciao Mauro

If you're developing a disambiguator, it's very useful to know that  
the -v command line option will give you useful information.  It
indicates among other things, which disambiguator rule(s) match
and which POS tag(s) get assigned to words as a result of
disambiguation. This was added by Marcin a few months ago
and I use it all the time :-)

In my experience, it is also very important to test your
disambiguation rules on several texts and think carefully about 
them, because lousy disambiguation can cause more problems
than it solves. The disambiguation rules you add may fix your 
particular example, but may break other sentences if they match
in unforeseen ways.

In your case, I suppose that you expected this disambiguation
rule to match...

+        <rule>
+            <pattern>
+                <token postag="PRE"></token>
+                <marker>
+                    <token postag="VER"></token>
+                </marker>
+            </pattern>
+            <disambig action="remove"
    postag="VER"></disambig>
+        </rule>


... but it did not match because the POS is "VER:ind+pres+1+s"
(not just VERB).

So you would need to use:<token postag="VER.*" postag_regexp="yes"></token>


But the disambiguation rule seems too general to me anyway. I have
not tried it, but I can imagine that the rule is not strict enough.  It will
match something like "da prendere" as in "La strada da prendere…"
even though "prendere" here is a verb in that example.


In your example, I think that the grammar rule GR_10_001[4] 
is also not strict enough. It uses several skip="-1" without
<exception> which is dangerous and which matches things in
unexpected ways.


$ echo "Prima di lasciarsi il tempo di pensare troppo raccolse zaino e bastone 
da viaggio e, con un lungo passo determinato, attraversò la soglia." | java 
-jar ~/sb/languagetool/dist/LanguageTool.jar -l it -v
Expected text language: Italian
Working on STDIN...
121 rules activated for language Italian
<S> Prima[primo/ADJ:pos+f+s,prima/ADV]  di[di/PRE]  
lasciarsi[lasciare/VER:inf+pres+si]  il[il/ART-M:s]  tempo[tempo/NOUN-M:s]  
di[di/PRE]  pensare[pensare/VER:inf+pres]  
troppo[troppo/ADV,troppo/ADJ:pos+m+s,troppo/DET-INDEF:m+s]  
raccolse[raccogliere/VER:ind+past+3+s]  zaino[zaino/NOUN-M:s]  e[e/CON]  
bastone[bastone/NOUN-M:s]  da[da/PRE]  
viaggio[viaggio/NOUN-M:s,viaggiare/VER:ind+pres+1+s]  e[e/CON],[,/PON]  
con[con/PRE]  un[un/ART-M:s]  lungo[lungo/ADJ:pos+m+s,lungo/PRE]  
passo[passo/NOUN-M:s,passo/ADJ:pos+m+s,passare/VER:ind+pres+1+s]  
determinato[determinato/ADJ:pos+m+s,determinare/VER:part+past+s+m],[,/PON]  
attraversò[attraversare/VER:ind+past+3+s]  la[la/PRO-PERS-CLI-3-F-S,la/ART-F:s] 
 
soglia[soglia/NOUN-F:s,solere/VER:cond+pres+2+s,solere/VER:cond+pres+1+s,solere/VER:cond+pres+3+s].[./SENT,</S>]<P/>
 
Disambiguator log: 

1.) Line 1, column 47, Rule ID: GR_10_001[4]
Message: Controllare il tempo dei verbi utilizzati nella frase.
...rima di lasciarsi il tempo di pensare tropporaccolse zaino e bastone da 
viaggio e, con un lungo passo determinato, attravers...
                                                
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                             
Time: 153ms for 1 sentences (6.5 sentences/sec)


You could make the rule a bit less dangerous this way in case
disambiguation was not good enough:


$ svn diff grammar.xml
Index: grammar.xml
===================================================================
--- grammar.xml    (revision 8681)
+++ grammar.xml    (working copy)
@@ -737,7 +737,10 @@
 <!--
                     <token 
postag="(VER.ind.imp.*.*)|(VER.ind.fut.*.*)|(VER.ind.pres.*.*)" 
postag_regexp="yes"><exception scope="previous" postag="(ART-F.*)|(ART-M.*)" 
postag_regexp="yes"></exception></token>
 -->
-                    <token postag="(VER.ind.fut.*.*)|(VER.ind.pres.*.*)" 
postag_regexp="yes"><exception scope="previous" postag="(ART-F.*)|(ART-M.*)" 
postag_regexp="yes"></exception></token>
+                    <token postag="(VER.ind.fut.*.*)|(VER.ind.pres.*.*)" 
postag_regexp="yes">
+                      <exception postag="NOUN.*" postag_regexp="yes"/>
+                      <exception scope="previous" postag="(ART-F.*)|(ART-M.*)" 
postag_regexp="yes"></exception>
+                    </token>
 <!-- PB006 - -->
                 </pattern>
                 <message>Controllare il tempo dei verbi utilizzati nella 
frase.</message>

That removes the error with GR_10_001[4]but there is then still
another false error with rule GR_10_001[1]

Paolo Bianchini wrote:

> The question is: is it better to have false positives or to miss some errors?

Personally, I prefer few false positive to missing some real errors.  
Of course, ideally you want to reduce false positives without missing errors
but if there is a choice, I'd say that false positive are more annoying that
missing errors.

The number of false positives should be much smaller than the
number of real errors on a typical text.  Of course on a perfect text,
you can only have false positive :-)  In Italian, I see more false
positives than real errors at the moment. Furthermore, false positives
in Italian also often highlight large portions of sentences which may
hides other real errors.  I prefer why only 1 or few words are 
highlighted.

If I check a typical article in a newspaper, I would ideally expect none or
very few false positive with LanguageTool.

Regards
-- Dominique


------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to