Marcin Miłkowski <[email protected]> wrote:

> W dniu 2013-03-17 18:30, Dominique Pellé pisze:
>> Hi
>>
>> In the Breton disambiguation file
>> languagetool-language-modules/br/target/classes/org/languagetool/resource/br/disambiguation.xml
>> I have the following immunization rule:
>>
>> <rule id="FRANCE_3" name="France 3">
>>    <pattern>
>>      <token>France</token>
>>      <token regexp="yes">[23]|Bleue</token>
>>    </pattern>
>>    <disambig action="immunize"/>
>> </rule>
>>
>> Yet I get this kind of error:
>>
>> $ echo "France 3 a zo ur chadenn skinwel." | \
>>    java -jar 
>> languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar
>>   -l br
>> Expected text language: Breton
>> Working on STDIN...
>> 1.) Line 1, column 1, Rule ID: BR_TOPO
>> Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h
>> skrivañ 'Frañs' pe 'bro-C’hall'?
>> Suggestion: Frañs; bro-C’hall
>> France 3 a zo ur chadenn skinwel.
>> ^^^^^^
>>
>> Isn't this a bug?  The words "France 3" should have been immunized,
>> so I did not expect to get the error.
>>
>> I assume that it happens because the rule BR_TOPO is a Java rule
>> and somehow immunization does not work with Java rules.
>
> This is even explicitly stated in our wiki:
>
> "Java rules can ignore immunization - it's up to their authors to
> respect immunization."
>
> If you want to respect immunization, simply check whether the token
> isImmunized(), and if so, never report such a token (as it's done in
> Pattern Rules).


Ok, thanks, I just tried it. That works.



>> Another oddity is the output of the verbose mode with the disambiguation 
>> rule:
>>
>> $ echo "France 3 a zo ur chadenn skinwel." | \
>>    java -jar 
>> languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar
>>   -l br -v
>> Expected text language: Breton
>> Working on STDIN...
>> 566 rules activated for language Breton
>> <S> France[France/Z e s top]  3[3]  a[mont/V pres 3 s,mont/V impe 2
>> s,monet/V pres 3 s,monet/V impe 2 s,a/P,a/N m sp,a/L a]  zo[teiñ/V
>> pres 3 s M:2:,teiñ/V impe 2 s M:2:,bezañ/V pres 3 s]  ur[un/D e sp]
>> chadenn[chadenn/N f s]  skinwel[skinwel/N m s].[</S>]<P/>
>> Disambiguator log:
>>
>> FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*]
>>
>> UR_N:1 chadenn[chadennañ/V pres 3 s,chadennañ/V impe 2 s,chadenn/N f
>> s] -> chadenn[chadenn/N f s]
>>
>> 1.) Line 1, column 1, Rule ID: BR_TOPO
>> Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h
>> skrivañ 'Frañs' pe 'bro-C’hall'?
>> Suggestion: Frañs; bro-C’hall
>> France 3 a zo ur chadenn skinwel.
>> ^^^^^^
>>
>>
>> Notice that the verbose mode outputs:
>>
>> FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*]
>>
>> This is odd, since I did not put any marker in the disambiguation rule which
>> contains 2 tokens,  so why does it output something only for the first token 
>> of
>> the disambiguation rule?
>
> Maybe it just applies immunization to a single token anyway. Probably a
> bug. You could confirm it by checking whether "3" has its
> isImmunized()==true.

I'll check that later.

Thanks again
Dominique

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to