Marcin Miłkowski <[email protected]> wrote: > W dniu 2013-03-17 18:30, Dominique Pellé pisze: >> Hi >> >> In the Breton disambiguation file >> languagetool-language-modules/br/target/classes/org/languagetool/resource/br/disambiguation.xml >> I have the following immunization rule: >> >> <rule id="FRANCE_3" name="France 3"> >> <pattern> >> <token>France</token> >> <token regexp="yes">[23]|Bleue</token> >> </pattern> >> <disambig action="immunize"/> >> </rule> >> >> Yet I get this kind of error: >> >> $ echo "France 3 a zo ur chadenn skinwel." | \ >> java -jar >> languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar >> -l br >> Expected text language: Breton >> Working on STDIN... >> 1.) Line 1, column 1, Rule ID: BR_TOPO >> Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h >> skrivañ 'Frañs' pe 'bro-C’hall'? >> Suggestion: Frañs; bro-C’hall >> France 3 a zo ur chadenn skinwel. >> ^^^^^^ >> >> Isn't this a bug? The words "France 3" should have been immunized, >> so I did not expect to get the error. >> >> I assume that it happens because the rule BR_TOPO is a Java rule >> and somehow immunization does not work with Java rules. > > This is even explicitly stated in our wiki: > > "Java rules can ignore immunization - it's up to their authors to > respect immunization." > > If you want to respect immunization, simply check whether the token > isImmunized(), and if so, never report such a token (as it's done in > Pattern Rules).
Ok, thanks, I just tried it. That works. >> Another oddity is the output of the verbose mode with the disambiguation >> rule: >> >> $ echo "France 3 a zo ur chadenn skinwel." | \ >> java -jar >> languagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar >> -l br -v >> Expected text language: Breton >> Working on STDIN... >> 566 rules activated for language Breton >> <S> France[France/Z e s top] 3[3] a[mont/V pres 3 s,mont/V impe 2 >> s,monet/V pres 3 s,monet/V impe 2 s,a/P,a/N m sp,a/L a] zo[teiñ/V >> pres 3 s M:2:,teiñ/V impe 2 s M:2:,bezañ/V pres 3 s] ur[un/D e sp] >> chadenn[chadenn/N f s] skinwel[skinwel/N m s].[</S>]<P/> >> Disambiguator log: >> >> FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*] >> >> UR_N:1 chadenn[chadennañ/V pres 3 s,chadennañ/V impe 2 s,chadenn/N f >> s] -> chadenn[chadenn/N f s] >> >> 1.) Line 1, column 1, Rule ID: BR_TOPO >> Message: France zo un anv lec’h gallek. Ha fellout a rae deoc’h >> skrivañ 'Frañs' pe 'bro-C’hall'? >> Suggestion: Frañs; bro-C’hall >> France 3 a zo ur chadenn skinwel. >> ^^^^^^ >> >> >> Notice that the verbose mode outputs: >> >> FRANCE_3:1 France[France/Z e s top*] -> France[France/Z e s top*] >> >> This is odd, since I did not put any marker in the disambiguation rule which >> contains 2 tokens, so why does it output something only for the first token >> of >> the disambiguation rule? > > Maybe it just applies immunization to a single token anyway. Probably a > bug. You could confirm it by checking whether "3" has its > isImmunized()==true. I'll check that later. Thanks again Dominique ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
