Dominique,
As far as I remember (it is documented somwhere), that is what happens when
you try to filter a non-existent tag. You try to filter "N.*" but there is
no N.* tag in the token. In your sentence "eil" is not tagged with N.
You need something like this:
<rule>
<pattern>
<token regexp="yes">u[ln]|a[nlr]</token>
<marker>
<and>
<token postag="V.*" postag_regexp="yes"/>
<token postag="N.*" postag_regexp="yes"/>
<and>
</marker>
</pattern>
<disambig action="filter" postag="N.*"/>
</rule>
Regards,
Jaume Ortolà
2014-09-03 6:22 GMT+02:00 Dominique Pellé <dominique.pe...@gmail.com>:
> Hi
>
> Have a look in the following debug output
> of LanguageTool where a token gets non-sensical
> POS tag "N.*" (multiple times) after a disambiguation
> rule is applied.
>
> Is it a bug in the disambiguator?
> Or am writing an incorrect disambiguation rule?
>
> $ echo "An eil"| java -jar
>
> languagetool-standalone/target/LanguageTool-2.7-SNAPSHOT/LanguageTool-2.7-SNAPSHOT/languagetool-commandline.jar
> -c utf-8 -l br -v
> Expected text language: Breton
> Working on STDIN...
> 664 rules activated for language Breton
> <S> An[mont/V pres 1 s,monet/V pres 1 s,an/D e sp,]
> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,</S>,]<P/>
> Disambiguator log:
>
> UR_N:2 eil[eilañ/V pres 3 s,eilañ/V impe 2 s,eil/K e sp
> o,eil/J,eilañ/SENT_END] ->
> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/SENT_END]
>
>
> Notice that the token "eil" gets POS tag "N.*" (which
> is a invalid POS tag, it's not mean to be a regexp) and
> furthermore, it gets that same POS tag 5 times after
> disambiguation.
>
> The disambiguation rule UR_N:2 in
>
> languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/disambiguation.xml
> is...
>
> <rule>
> <pattern>
> <token regexp="yes">u[ln]|a[nlr]</token>
> <marker>
> <token postag="V.*" postag_regexp="yes"/>
> </marker>
> </pattern>
> <disambig action="filter" postag="N.*"/>
> </rule>
>
> The idea of the disambiguation rule is that, if the
> word following "an" (or al, or ar, etc.) is a verb (V.*),
> then keep only its noun POS tag (N.*)
> in case it happens to be also a noun.
> But obviously, this is not what's happening here.
>
> Regards
> Dominique
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel