Bugs item #3585215, was opened at 2012-11-07 10:25
Message generated for change (Comment added) made by milek_pl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3585215&group_id=110216

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Dominique Pelle (dominikoeo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Wrong case in suggestions

Initial Comment:
The following comand gives 2 correction suggestions ("vBrest" or "Vrest"):

$ echo "Mont a reas da Brest." | java -jar 
~/sb/languagetool/dist/LanguageTool.jar -l br 
Expected text language: Breton
Working on STDIN...
1.) Line 1, column 16, Rule ID: KEMM_DA_ANV_DIVOUTIN[3]
Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant an anv 
divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'?
Suggestion: VBrest; Vrest
Mont a reas da Brest. 
               ^^^^^  
Time: 69ms for 1 sentences (14.5 sentences/sec)

Good so far.  The case (uppercase in the second letter) may look strange but 
that's a correct spelling in Breton.

However, if I used the same sentence and same options but only add the --api 
flag, then the suggestions becomes incorrect (wrong case):

$ echo "Mont a reas da Brest." | java -jar 
~/sb/languagetool/dist/LanguageTool.jar -l br --api
<?xml version="1.0" encoding="UTF-8"?>
<matches software="LanguageTool" version="2.0-dev" buildDate="2012-11-07">
<error fromy="0" fromx="15" toy="0" tox="20" ruleId="KEMM_DA_ANV_DIVOUTIN" 
subId="3"  msg="Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da» gant 
an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe 'Vrest'?" 
replacements="VBrest#Vrest" context="Mont a reas da Brest. " contextoffset="15" 
offset="15" errorlength="5" category="Kemmadur"/>
</matches>
<!--
Time: 100ms for 1 sentences (10.0 sentences/sec)
-->

Notice that wth command line option --api, LanguageTool suggests:   
replacements="VBrest#Vrest"

This is incorrect, I would expect to get:    replacements="vBrest#Vrest".  

Yet notice that the suggestions are still correct in the message: 
msg="[...snip...] 'vBrest' pe 'Vrest'?"


----------------------------------------------------------------------

>Comment By: Marcin Miłkowski (milek_pl)
Date: 2013-03-06 23:59

Message:
I can see. Your fix would indeed break a lot of things. We need to change
more code.

----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2013-03-06 19:41

Message:
I'm changing back "Resolution" from "Works for  me" to "Accepted" since the
change of resolution was based on a misunderstanding I think.  See my
previous comment.

If it indeed should work, please indicate how the
<suggestion>...</suggestion> can be written to properly suggest  "vBrest" 
(and not VBrest)  with test the sentence given in previous comment. I've
tried different things, but none of them worked.

----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2013-03-06 01:47

Message:
Marcin wrote:

> We already have this option in match element. Instead of using
> the attribute on suggestion, you use <match> with
> case_conversion='preserve'.

No, it does not work.

If I change the Breton rule KEMM_DA_ANV_DIVOUTIN[3] as follows for
example:

   <rule>
     <pattern>
       <token>da</token>
       <marker>
         <token postag="Z [^M]*" postag_regexp="yes"
regexp="yes">(?-i)B.*</token>
       </marker>
     </pattern>
     <message>Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger
«\1» gant an anv divoutin. Ha fellout a rae deoc’h skrivañ
<suggestion><match no="2" case_conversion="preserve" regexp_match="^"
regexp_replace="v"/></suggestion> pe <suggestion><match no="2"
regexp_match="^." regexp_replace="V"/></suggestion>?</message>
     <example type="incorrect">Mont a reas da
<marker>Brest</marker>.</example>
     <example type="correct">Mont a reas da
<marker>Vrest</marker>.</example>
     <example type="correct">Mont a reas da
<marker>vBrest</marker>.</example>
   </rule>
   <rule>


(i.e. I used <suggestion><match no="2" case_conversion="preserve"
regexp_match="^" regexp_replace="v"/></suggestion>)

I still get the wrong case for the suggestion:


$ echo "Da Brest." | \
  java -jar
anguagetool/languagetool-standalone/target/LanguageTool-2.1-beta1/LanguageTool-2.1-beta1/languagetool-commandline.jar
-v -l br
Expected text language: Breton
Working on STDIN...
549 rules activated for language Breton
<S> Da[da/P,da/D e sp]  Brest[Brest/Z e s top,prestiñ/V pres 3 s
M:1:1a:,prestiñ/V impe 2 s M:1:1a:,prestañ/V pres 3 s M:1:1a:,prestañ/V
impe 2 s M:1:1a:,prest/N m s M:1:1a:,prest/J M:1:1a:].[</S>]<P/> 
Disambiguator log: 

1.) Line 1, column 4, Rule ID: KEMM_DA_ANV_DIVOUTIN[3]
Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «Da»
gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'VBrest' pe
'Vrest'?
Suggestion: VBrest; Vrest
Da Brest. 
   ^^^^^  


Notice that LT suggests "VBrest", and "Vrest",  but what I really wanted is
those 2 suggestions "vBrest" and "Vrest" as suggestions.

My previous comment gives a patch to fix for this, but I did not check-in,
as I was unsure whether there was any unwanted side effects elswhere.


----------------------------------------------------------------------

Comment By: Marcin Miłkowski (milek_pl)
Date: 2013-03-06 01:01

Message:
Dominique, we already have this option in match element. Instead of using
the attribute on suggestion, you use <match> with
case_conversion='preserve'.

I'm changing to "works for me". Let me see if it does not work.

----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2012-11-07 21:08

Message:
I see, the function RuleMatch(...) in
src/main/java/org/languagetool/rules/RuleMatch.java sets the first letter
in uppercase when what is being replaced starts with uppercase.

If I disabled it as follows, then I get the correction suggestion. 
However, it's likely to be incorrect as the current behavior isexpected for
backward compatibility:

$ svn diff src/main/java/org/languagetool/rules/RuleMatch.java
Index: src/main/java/org/languagetool/rules/RuleMatch.java
===================================================================
--- src/main/java/org/languagetool/rules/RuleMatch.java (revision 8312)
+++ src/main/java/org/languagetool/rules/RuleMatch.java (working copy)
@@ -81,9 +81,9 @@
     while (matcher.find(pos)) {
       pos = matcher.end();
       String replacement = matcher.group(1);
-      if (startWithUppercase) {
-        replacement = StringTools.uppercaseFirstChar(replacement);
-      }
+      //if (startWithUppercase) {
+      //  replacement = StringTools.uppercaseFirstChar(replacement);
+      //}
       suggestedReplacements.add(replacement);
     }
   }


Putting the first letter of the suggestion in uppercase may be OK in many
cases but not all the time. So I think we'd need an option such as
<suggestion case_conversion="preserve">...</suggestion> in order to fix
this issue.


----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2012-11-07 20:21

Message:
Oops, I wrote that the bug happens with --api only but that's not true.  I
don't know why I did not see but unlike what I wrote in bug description,
bug also happens with:

$ echo "Mont a reas da Brest." | java -jar
~/sb/languagetool/dist/LanguageTool.jar -l br
Expected text language: Breton
Working on STDIN...
1.) Line 1, column 16, Rule ID: KEMM_DA_ANV_DIVOUTIN[3]
Message: Ur c’hemmadur dre vlotaat a zlefe bezañ goude ar ger «da»
gant an anv divoutin. Ha fellout a rae deoc’h skrivañ 'vBrest' pe
'Vrest'?
Suggestion: VBrest; Vrest
Mont a reas da Brest.
^^^^^
Time: 69ms for 1 sentences (14.5 sentences/sec)


The line....

Suggestion: VBrest; Vrest

... is incorrect.  Expected would be:

Suggestion: vBrest; Vrest

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3585215&group_id=110216

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Languagetool-commits mailing list
Languagetool-commits@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-commits

Reply via email to