W dniu 2013-03-17 18:30, Dominique Pellé pisze: > Hi > > In the Breton disambiguation file > languagetool-language-modules/br/target/classes/org/languagetool/resource/br/disambiguation.xml > I have the following immunization rule: > > <rule id="FRANCE_3" name="France 3"> > <pattern> > <token>France</token> > <token regexp="yes">[23]|Bleue</token> > </pattern> > <disambig action="immunize"/> > </rule> > > Yet I get this kind of error: > > $ echo "France 3 a zo ur chadenn skinwel." | \ > java -jar > languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar > -l br > Expected text language: Breton > Working on STDIN... > 1.) Line 1, column 1, Rule ID: BR_TOPO > Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h > skrivañ 'Frañs' pe 'bro-C’hall'? > Suggestion: Frañs; bro-C’hall > France 3 a zo ur chadenn skinwel. > ^^^^^^ > > Isn't this a bug? The words "France 3" should have been immunized, > so I did not expect to get the error. > > I assume that it happens because the rule BR_TOPO is a Java rule > and somehow immunization does not work with Java rules.
This is even explicitly stated in our wiki: "Java rules can ignore immunization - it's up to their authors to respect immunization." If you want to respect immunization, simply check whether the token isImmunized(), and if so, never report such a token (as it's done in Pattern Rules). > > Another oddity is the output of the verbose mode with the disambiguation rule: > > $ echo "France 3 a zo ur chadenn skinwel." | \ > java -jar > languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar > -l br -v > Expected text language: Breton > Working on STDIN... > 566 rules activated for language Breton > <S> France[France/Z e s top] 3[3] a[mont/V pres 3 s,mont/V impe 2 > s,monet/V pres 3 s,monet/V impe 2 s,a/P,a/N m sp,a/L a] zo[teiñ/V > pres 3 s M:2:,teiñ/V impe 2 s M:2:,bezañ/V pres 3 s] ur[un/D e sp] > chadenn[chadenn/N f s] skinwel[skinwel/N m s].[</S>]<P/> > Disambiguator log: > > FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*] > > UR_N:1 chadenn[chadennañ/V pres 3 s,chadennañ/V impe 2 s,chadenn/N f > s] -> chadenn[chadenn/N f s] > > 1.) Line 1, column 1, Rule ID: BR_TOPO > Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h > skrivañ 'Frañs' pe 'bro-C’hall'? > Suggestion: Frañs; bro-C’hall > France 3 a zo ur chadenn skinwel. > ^^^^^^ > > > Notice that the verbose mode outputs: > > FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*] > > This is odd, since I did not put any marker in the disambiguation rule which > contains 2 tokens, so why does it output something only for the first token > of > the disambiguation rule? Maybe it just applies immunization to a single token anyway. Probably a bug. You could confirm it by checking whether "3" has its isImmunized()==true. Regards Marcin > > Regards > Dominique > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
