Bugs item #3585215, was opened at 2012-11-07 10:25 Message generated for change (Comment added) made by milek_pl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3585215&group_id=110216
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Resolution: Wont Fix Priority: 5 Private: No Submitted By: Dominique Pelle (dominikoeo) Assigned to: Nobody/Anonymous (nobody) Summary: Wrong case in suggestions Initial Comment: The following comand gives 2 correction suggestions ("vBrest" or "Vrest"): $ echo "Mont a reas da Brest." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l br Expected text language: Breton Working on STDIN... 1.) Line 1, column 16, Rule ID: KEMM_DA_ANV_DIVOUTIN[3] Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'? Suggestion: VBrest; Vrest Mont a reas da Brest. ^^^^^ Time: 69ms for 1 sentences (14.5 sentences/sec) Good so far. The case (uppercase in the second letter) may look strange but that's a correct spelling in Breton. However, if I used the same sentence and same options but only add the --api flag, then the suggestions becomes incorrect (wrong case): $ echo "Mont a reas da Brest." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l br --api <?xml version="1.0" encoding="UTF-8"?> <matches software="LanguageTool" version="2.0-dev" buildDate="2012-11-07"> <error fromy="0" fromx="15" toy="0" tox="20" ruleId="KEMM_DA_ANV_DIVOUTIN" subId="3" msg="Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'?" replacements="VBrest#Vrest" context="Mont a reas da Brest. " contextoffset="15" offset="15" errorlength="5" category="Kemmadur"/> </matches> <!-- Time: 100ms for 1 sentences (10.0 sentences/sec) --> Notice that wth command line option --api, LanguageTool suggests: replacements="VBrest#Vrest" This is incorrect, I would expect to get: replacements="vBrest#Vrest". Yet notice that the suggestions are still correct in the message: msg="[...snip...] 'vBrest' pe 'Vrest'?" ---------------------------------------------------------------------- >Comment By: Marcin Miłkowski (milek_pl) Date: 2013-03-07 01:35 Message: Great. This is a fairly counterintuitive case, I will add this to our wiki. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2013-03-07 01:00 Message: Yes, that works indeed in Breton rule KEMM_DA_ANV_DIVOUTIN: <suggestion><match no="2" case_conversion="startlower" regexp_match="^" regexp_replace="v"/></suggestion> Thanks for the help. I'm fine with the resolution "Wont Fix". ---------------------------------------------------------------------- Comment By: Marcin Miłkowski (milek_pl) Date: 2013-03-07 00:08 Message: OK. I can see what was the problem. The 'preserve' attribute means 'preserve the case scheme of the matched word'. Here, you actually change it. So you need to use 'startlower'. It works perfectly then, at least on my computer. So there's nothing to fix. ---------------------------------------------------------------------- Comment By: Marcin Miłkowski (milek_pl) Date: 2013-03-06 23:59 Message: I can see. Your fix would indeed break a lot of things. We need to change more code. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2013-03-06 19:41 Message: I'm changing back "Resolution" from "Works for me" to "Accepted" since the change of resolution was based on a misunderstanding I think. See my previous comment. If it indeed should work, please indicate how the <suggestion>...</suggestion> can be written to properly suggest "vBrest" (and not VBrest) with test the sentence given in previous comment. I've tried different things, but none of them worked. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2013-03-06 01:47 Message: Marcin wrote: > We already have this option in match element. Instead of using > the attribute on suggestion, you use <match> with > case_conversion='preserve'. No, it does not work. If I change the Breton rule KEMM_DA_ANV_DIVOUTIN[3] as follows for example: <rule> <pattern> <token>da</token> <marker> <token postag="Z [^M]*" postag_regexp="yes" regexp="yes">(?-i)B.*</token> </marker> </pattern> <message>Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «\1» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ <suggestion><match no="2" case_conversion="preserve" regexp_match="^" regexp_replace="v"/></suggestion> pe <suggestion><match no="2" regexp_match="^." regexp_replace="V"/></suggestion>?</message> <example type="incorrect">Mont a reas da <marker>Brest</marker>.</example> <example type="correct">Mont a reas da <marker>Vrest</marker>.</example> <example type="correct">Mont a reas da <marker>vBrest</marker>.</example> </rule> <rule> (i.e. I used <suggestion><match no="2" case_conversion="preserve" regexp_match="^" regexp_replace="v"/></suggestion>) I still get the wrong case for the suggestion: $ echo "Da Brest." | \ java -jar anguagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar -v -l br Expected text language: Breton Working on STDIN... 549 rules activated for language Breton <S> Da[da/P,da/D e sp] Brest[Brest/Z e s top,prestiñ/V pres 3 s M:1:1a:,prestiñ/V impe 2 s M:1:1a:,prestañ/V pres 3 s M:1:1a:,prestañ/V impe 2 s M:1:1a:,prest/N m s M:1:1a:,prest/J M:1:1a:].[</S>]<P/> Disambiguator log: 1.) Line 1, column 4, Rule ID: KEMM_DA_ANV_DIVOUTIN[3] Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «Da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'VBrest' pe 'Vrest'? Suggestion: VBrest; Vrest Da Brest. ^^^^^ Notice that LT suggests "VBrest", and "Vrest", but what I really wanted is those 2 suggestions "vBrest" and "Vrest" as suggestions. My previous comment gives a patch to fix for this, but I did not check-in, as I was unsure whether there was any unwanted side effects elswhere. ---------------------------------------------------------------------- Comment By: Marcin Miłkowski (milek_pl) Date: 2013-03-06 01:01 Message: Dominique, we already have this option in match element. Instead of using the attribute on suggestion, you use <match> with case_conversion='preserve'. I'm changing to "works for me". Let me see if it does not work. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2012-11-07 21:08 Message: I see, the function RuleMatch(...) in src/main/java/org/languagetool/rules/RuleMatch.java sets the first letter in uppercase when what is being replaced starts with uppercase. If I disabled it as follows, then I get the correction suggestion. However, it's likely to be incorrect as the current behavior isexpected for backward compatibility: $ svn diff src/main/java/org/languagetool/rules/RuleMatch.java Index: src/main/java/org/languagetool/rules/RuleMatch.java =================================================================== --- src/main/java/org/languagetool/rules/RuleMatch.java (revision 8312) +++ src/main/java/org/languagetool/rules/RuleMatch.java (working copy) @@ -81,9 +81,9 @@ while (matcher.find(pos)) { pos = matcher.end(); String replacement = matcher.group(1); - if (startWithUppercase) { - replacement = StringTools.uppercaseFirstChar(replacement); - } + //if (startWithUppercase) { + // replacement = StringTools.uppercaseFirstChar(replacement); + //} suggestedReplacements.add(replacement); } } Putting the first letter of the suggestion in uppercase may be OK in many cases but not all the time. So I think we'd need an option such as <suggestion case_conversion="preserve">...</suggestion> in order to fix this issue. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2012-11-07 20:21 Message: Oops, I wrote that the bug happens with --api only but that's not true. I don't know why I did not see but unlike what I wrote in bug description, bug also happens with: $ echo "Mont a reas da Brest." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l br Expected text language: Breton Working on STDIN... 1.) Line 1, column 16, Rule ID: KEMM_DA_ANV_DIVOUTIN[3] Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'? Suggestion: VBrest; Vrest Mont a reas da Brest. ^^^^^ Time: 69ms for 1 sentences (14.5 sentences/sec) The line.... Suggestion: VBrest; Vrest ... is incorrect. Expected would be: Suggestion: vBrest; Vrest ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3585215&group_id=110216 ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Languagetool-commits mailing list Languagetool-commits@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-commits