Thanks Jaume.  I'll do that soon for that disambiguation rule.

However, if it's not too hard (?) it would be better in my opinion
for <disambig action="filter" postag="..."/>  to be a no-op if the
postag regex does not match anything. I would make disambiguation
rules simpler. It would also be less dangerous.  It's easy to make
the same mistake as I did.  I suspect that several other
disambiguation rules have the same kind of mistake in several
languages. I only found this one by chance by looking at the -v
debug output of LanguageTool. I'm not aware of a systematic
way to find all other similar mistakes.

Regards
Dominique


Jaume Ortolà i Font <jaumeort...@gmail.com> wrote:

> Dominique,
>
> As far as I remember (it is documented somwhere), that is what happens when
> you try to filter a non-existent tag. You try to filter "N.*" but there is
> no N.* tag in the token. In your sentence "eil" is not tagged with N.
>
> You need something like this:
>
>     <rule>
>       <pattern>
>         <token regexp="yes">u[ln]|a[nlr]</token>
>         <marker>
>            <and>
>
>              <token postag="V.*" postag_regexp="yes"/>
>              <token postag="N.*" postag_regexp="yes"/>
>            <and>
>
>         </marker>
>       </pattern>
>       <disambig action="filter" postag="N.*"/>
>     </rule>
>
>
> Regards,
> Jaume Ortolà
>
>
>
>
> 2014-09-03 6:22 GMT+02:00 Dominique Pellé <dominique.pe...@gmail.com>:
>>
>> Hi
>>
>> Have a look in the following debug output
>> of LanguageTool where a token gets non-sensical
>> POS tag "N.*" (multiple times) after a disambiguation
>> rule is applied.
>>
>> Is it a bug in the disambiguator?
>> Or am writing an incorrect disambiguation rule?
>>
>> $ echo "An eil"| java -jar
>>
>> languagetool-standalone/target/LanguageTool-2.7-SNAPSHOT/LanguageTool-2.7-SNAPSHOT/languagetool-commandline.jar
>> -c utf-8 -l br -v
>> Expected text language: Breton
>> Working on STDIN...
>> 664 rules activated for language Breton
>> <S> An[mont/V pres 1 s,monet/V pres 1 s,an/D e sp,]
>> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,</S>,]<P/>
>> Disambiguator log:
>>
>> UR_N:2 eil[eilañ/V pres 3 s,eilañ/V impe 2 s,eil/K e sp
>> o,eil/J,eilañ/SENT_END] ->
>> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/SENT_END]
>>
>>
>> Notice that the token "eil" gets POS tag "N.*" (which
>> is a invalid POS tag, it's not mean to be a regexp) and
>> furthermore, it gets that same POS tag 5 times after
>> disambiguation.
>>
>> The disambiguation rule UR_N:2 in
>>
>> languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/disambiguation.xml
>> is...
>>
>>     <rule>
>>       <pattern>
>>         <token regexp="yes">u[ln]|a[nlr]</token>
>>         <marker>
>>           <token postag="V.*" postag_regexp="yes"/>
>>         </marker>
>>       </pattern>
>>       <disambig action="filter" postag="N.*"/>
>>     </rule>
>>
>> The idea of the disambiguation rule is that, if the
>> word following "an" (or al, or ar, etc.) is a verb (V.*),
>> then keep only its noun POS tag (N.*)
>> in case it happens to be also a noun.
>> But obviously, this is not what's happening here.
>>
>> Regards
>> Dominique
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to