W dniu 2013-03-17 18:30, Dominique Pellé pisze:
> Hi
>
> In the Breton disambiguation file
> languagetool-language-modules/br/target/classes/org/languagetool/resource/br/disambiguation.xml
> I have the following immunization rule:
>
> <rule id="FRANCE_3" name="France 3">
>    <pattern>
>      <token>France</token>
>      <token regexp="yes">[23]|Bleue</token>
>    </pattern>
>    <disambig action="immunize"/>
> </rule>
>
> Yet I get this kind of error:
>
> $ echo "France 3 a zo ur chadenn skinwel." | \
>    java -jar 
> languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar
>   -l br
> Expected text language: Breton
> Working on STDIN...
> 1.) Line 1, column 1, Rule ID: BR_TOPO
> Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h
> skrivañ 'Frañs' pe 'bro-C’hall'?
> Suggestion: Frañs; bro-C’hall
> France 3 a zo ur chadenn skinwel.
> ^^^^^^
>
> Isn't this a bug?  The words "France 3" should have been immunized,
> so I did not expect to get the error.
>
> I assume that it happens because the rule BR_TOPO is a Java rule
> and somehow immunization does not work with Java rules.

This is even explicitly stated in our wiki:

"Java rules can ignore immunization - it's up to their authors to 
respect immunization."

If you want to respect immunization, simply check whether the token 
isImmunized(), and if so, never report such a token (as it's done in 
Pattern Rules).

>
> Another oddity is the output of the verbose mode with the disambiguation rule:
>
> $ echo "France 3 a zo ur chadenn skinwel." | \
>    java -jar 
> languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar
>   -l br -v
> Expected text language: Breton
> Working on STDIN...
> 566 rules activated for language Breton
> <S> France[France/Z e s top]  3[3]  a[mont/V pres 3 s,mont/V impe 2
> s,monet/V pres 3 s,monet/V impe 2 s,a/P,a/N m sp,a/L a]  zo[teiñ/V
> pres 3 s M:2:,teiñ/V impe 2 s M:2:,bezañ/V pres 3 s]  ur[un/D e sp]
> chadenn[chadenn/N f s]  skinwel[skinwel/N m s].[</S>]<P/>
> Disambiguator log:
>
> FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*]
>
> UR_N:1 chadenn[chadennañ/V pres 3 s,chadennañ/V impe 2 s,chadenn/N f
> s] -> chadenn[chadenn/N f s]
>
> 1.) Line 1, column 1, Rule ID: BR_TOPO
> Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h
> skrivañ 'Frañs' pe 'bro-C’hall'?
> Suggestion: Frañs; bro-C’hall
> France 3 a zo ur chadenn skinwel.
> ^^^^^^
>
>
> Notice that the verbose mode outputs:
>
> FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*]
>
> This is odd, since I did not put any marker in the disambiguation rule which
> contains 2 tokens,  so why does it output something only for the first token 
> of
> the disambiguation rule?

Maybe it just applies immunization to a single token anyway. Probably a 
bug. You could confirm it by checking whether "3" has its 
isImmunized()==true.

Regards
Marcin
>
> Regards
> Dominique
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to