Dominique,

As far as I remember (it is documented somwhere), that is what happens when
you try to filter a non-existent tag. You try to filter "N.*" but there is
no N.* tag in the token. In your sentence "eil" is not tagged with N.

You need something like this:

    <rule>
      <pattern>
        <token regexp="yes">u[ln]|a[nlr]</token>
        <marker>
           <and>
             <token postag="V.*" postag_regexp="yes"/>
             <token postag="N.*" postag_regexp="yes"/>
           <and>
        </marker>
      </pattern>
      <disambig action="filter" postag="N.*"/>
    </rule>


Regards,
Jaume Ortolà




2014-09-03 6:22 GMT+02:00 Dominique Pellé <dominique.pe...@gmail.com>:

> Hi
>
> Have a look in the following debug output
> of LanguageTool where a token gets non-sensical
> POS tag "N.*" (multiple times) after a disambiguation
> rule is applied.
>
> Is it a bug in the disambiguator?
> Or am writing an incorrect disambiguation rule?
>
> $ echo "An eil"| java -jar
>
> languagetool-standalone/target/LanguageTool-2.7-SNAPSHOT/LanguageTool-2.7-SNAPSHOT/languagetool-commandline.jar
> -c utf-8 -l br -v
> Expected text language: Breton
> Working on STDIN...
> 664 rules activated for language Breton
> <S> An[mont/V pres 1 s,monet/V pres 1 s,an/D e sp,]
> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,</S>,]<P/>
> Disambiguator log:
>
> UR_N:2 eil[eilañ/V pres 3 s,eilañ/V impe 2 s,eil/K e sp
> o,eil/J,eilañ/SENT_END] ->
> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/SENT_END]
>
>
> Notice that the token "eil" gets POS tag "N.*" (which
> is a invalid POS tag, it's not mean to be a regexp) and
> furthermore, it gets that same POS tag 5 times after
> disambiguation.
>
> The disambiguation rule UR_N:2 in
>
> languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/disambiguation.xml
> is...
>
>     <rule>
>       <pattern>
>         <token regexp="yes">u[ln]|a[nlr]</token>
>         <marker>
>           <token postag="V.*" postag_regexp="yes"/>
>         </marker>
>       </pattern>
>       <disambig action="filter" postag="N.*"/>
>     </rule>
>
> The idea of the disambiguation rule is that, if the
> word following "an" (or al, or ar, etc.) is a verb (V.*),
> then keep only its noun POS tag (N.*)
> in case it happens to be also a noun.
> But obviously, this is not what's happening here.
>
> Regards
> Dominique
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to