Bugs item #3585215, was opened at 2012-11-07 10:25 Message generated for change (Comment added) made by milek_pl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3585215&group_id=110216
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Dominique Pelle (dominikoeo) Assigned to: Nobody/Anonymous (nobody) Summary: Wrong case in suggestions Initial Comment: The following comand gives 2 correction suggestions ("vBrest" or "Vrest"): $ echo "Mont a reas da Brest." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l br Expected text language: Breton Working on STDIN... 1.) Line 1, column 16, Rule ID: KEMM_DA_ANV_DIVOUTIN[3] Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'? Suggestion: VBrest; Vrest Mont a reas da Brest. ^^^^^ Time: 69ms for 1 sentences (14.5 sentences/sec) Good so far. The case (uppercase in the second letter) may look strange but that's a correct spelling in Breton. However, if I used the same sentence and same options but only add the --api flag, then the suggestions becomes incorrect (wrong case): $ echo "Mont a reas da Brest." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l br --api <?xml version="1.0" encoding="UTF-8"?> <matches software="LanguageTool" version="2.0-dev" buildDate="2012-11-07"> <error fromy="0" fromx="15" toy="0" tox="20" ruleId="KEMM_DA_ANV_DIVOUTIN" subId="3" msg="Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'?" replacements="VBrest#Vrest" context="Mont a reas da Brest. " contextoffset="15" offset="15" errorlength="5" category="Kemmadur"/> </matches> <!-- Time: 100ms for 1 sentences (10.0 sentences/sec) --> Notice that wth command line option --api, LanguageTool suggests: replacements="VBrest#Vrest" This is incorrect, I would expect to get: replacements="vBrest#Vrest". Yet notice that the suggestions are still correct in the message: msg="[...snip...] 'vBrest' pe 'Vrest'?" ---------------------------------------------------------------------- >Comment By: Marcin Miłkowski (milek_pl) Date: 2013-03-06 23:59 Message: I can see. Your fix would indeed break a lot of things. We need to change more code. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2013-03-06 19:41 Message: I'm changing back "Resolution" from "Works for me" to "Accepted" since the change of resolution was based on a misunderstanding I think. See my previous comment. If it indeed should work, please indicate how the <suggestion>...</suggestion> can be written to properly suggest "vBrest" (and not VBrest) with test the sentence given in previous comment. I've tried different things, but none of them worked. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2013-03-06 01:47 Message: Marcin wrote: > We already have this option in match element. Instead of using > the attribute on suggestion, you use <match> with > case_conversion='preserve'. No, it does not work. If I change the Breton rule KEMM_DA_ANV_DIVOUTIN[3] as follows for example: <rule> <pattern> <token>da</token> <marker> <token postag="Z [^M]*" postag_regexp="yes" regexp="yes">(?-i)B.*</token> </marker> </pattern> <message>Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «\1» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ <suggestion><match no="2" case_conversion="preserve" regexp_match="^" regexp_replace="v"/></suggestion> pe <suggestion><match no="2" regexp_match="^." regexp_replace="V"/></suggestion>?</message> <example type="incorrect">Mont a reas da <marker>Brest</marker>.</example> <example type="correct">Mont a reas da <marker>Vrest</marker>.</example> <example type="correct">Mont a reas da <marker>vBrest</marker>.</example> </rule> <rule> (i.e. I used <suggestion><match no="2" case_conversion="preserve" regexp_match="^" regexp_replace="v"/></suggestion>) I still get the wrong case for the suggestion: $ echo "Da Brest." | \ java -jar anguagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar -v -l br Expected text language: Breton Working on STDIN... 549 rules activated for language Breton <S> Da[da/P,da/D e sp] Brest[Brest/Z e s top,prestiñ/V pres 3 s M:1:1a:,prestiñ/V impe 2 s M:1:1a:,prestañ/V pres 3 s M:1:1a:,prestañ/V impe 2 s M:1:1a:,prest/N m s M:1:1a:,prest/J M:1:1a:].[</S>]<P/> Disambiguator log: 1.) Line 1, column 4, Rule ID: KEMM_DA_ANV_DIVOUTIN[3] Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «Da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'VBrest' pe 'Vrest'? Suggestion: VBrest; Vrest Da Brest. ^^^^^ Notice that LT suggests "VBrest", and "Vrest", but what I really wanted is those 2 suggestions "vBrest" and "Vrest" as suggestions. My previous comment gives a patch to fix for this, but I did not check-in, as I was unsure whether there was any unwanted side effects elswhere. ---------------------------------------------------------------------- Comment By: Marcin Miłkowski (milek_pl) Date: 2013-03-06 01:01 Message: Dominique, we already have this option in match element. Instead of using the attribute on suggestion, you use <match> with case_conversion='preserve'. I'm changing to "works for me". Let me see if it does not work. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2012-11-07 21:08 Message: I see, the function RuleMatch(...) in src/main/java/org/languagetool/rules/RuleMatch.java sets the first letter in uppercase when what is being replaced starts with uppercase. If I disabled it as follows, then I get the correction suggestion. However, it's likely to be incorrect as the current behavior isexpected for backward compatibility: $ svn diff src/main/java/org/languagetool/rules/RuleMatch.java Index: src/main/java/org/languagetool/rules/RuleMatch.java =================================================================== --- src/main/java/org/languagetool/rules/RuleMatch.java (revision 8312) +++ src/main/java/org/languagetool/rules/RuleMatch.java (working copy) @@ -81,9 +81,9 @@ while (matcher.find(pos)) { pos = matcher.end(); String replacement = matcher.group(1); - if (startWithUppercase) { - replacement = StringTools.uppercaseFirstChar(replacement); - } + //if (startWithUppercase) { + // replacement = StringTools.uppercaseFirstChar(replacement); + //} suggestedReplacements.add(replacement); } } Putting the first letter of the suggestion in uppercase may be OK in many cases but not all the time. So I think we'd need an option such as <suggestion case_conversion="preserve">...</suggestion> in order to fix this issue. ---------------------------------------------------------------------- Comment By: Dominique Pelle (dominikoeo) Date: 2012-11-07 20:21 Message: Oops, I wrote that the bug happens with --api only but that's not true. I don't know why I did not see but unlike what I wrote in bug description, bug also happens with: $ echo "Mont a reas da Brest." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l br Expected text language: Breton Working on STDIN... 1.) Line 1, column 16, Rule ID: KEMM_DA_ANV_DIVOUTIN[3] Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'? Suggestion: VBrest; Vrest Mont a reas da Brest. ^^^^^ Time: 69ms for 1 sentences (14.5 sentences/sec) The line.... Suggestion: VBrest; Vrest ... is incorrect. Expected would be: Suggestion: vBrest; Vrest ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3585215&group_id=110216 ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Languagetool-commits mailing list Languagetool-commits@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-commits