Bugs item #3054895, was opened at 2010-08-28 06:57
Message generated for change (Comment added) made by milek_pl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3054895&group_id=110216

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: 1.0
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Daniel Naber (dnaber)
Assigned to: Marcin Miłkowski (milek_pl)
Summary:  line & column numbers incorrect when LT reads from stdin

Initial Comment:
Copying Dominique Pellé's report from the mailing list:

Example:

================================================
# sample files with errors at line 0 and 2 (line 1 is empty).
$ cat test.txt
This is a test of of language tool.

This is is a test of language tool.

# When reading from a file, the line numbers (fromy and toy look correct):
$ java -jar JLanguageTool/dist/LanguageTool.jar --api test.txt
<?xml version="1.0" encoding="UTF-8"?>
<matches>
<error fromy="0" fromx="15" toy="0" tox="21" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="of"
context="This is a test of of language tool.  This is is a test of
languag..." contextoffset="15" errorlength="5"/>
<error fromy="2" fromx="5" toy="2" tox="10" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="is"
context="This is a test of of language tool.  This is is a test of
language tool. " contextoffset="42" errorlength="5"/>
</matches>
<!--
Time: 105ms for 2 sentences (19.0 sentences/sec)
-->

# Now when reading the same file test.txt from stdin...
$ java -jar JLanguageTool/dist/LanguageTool.jar --api - < test.txt
<?xml version="1.0" encoding="UTF-8"?>
<matches>
<error fromy="1" fromx="15" toy="1" tox="21" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="of"
context="This is a test of of language tool.  " contextoffset="15"
errorlength="5"/>
<error fromy="2" fromx="5" toy="2" tox="11" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="is"
context="This is is a test of language tool. " contextoffset="5"
errorlength="5"/>
<!--
Time: 111ms for 3 sentences (27.0 sentences/sec)
-->
================================================

Notice that the fromy and toy fields are incorrect.  The tox is
also incorrect.  The contextoffset is also different.

----------------------------------------------------------------------

>Comment By: Marcin Miłkowski (milek_pl)
Date: 2012-06-24 12:33

Message:
closing then.

----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2012-06-24 11:10

Message:
> Dominique,
> could you check?

At least the 2 examples I gave are now fixed. Doing further tests... I
don't see anything wrong anymore.
I assume this bug can be marked a resolved now.  Changing resolution as
fixed.

----------------------------------------------------------------------

Comment By: Marcin Miłkowski (milek_pl)
Date: 2012-06-18 10:51

Message:
When I look again, the contextoffset is fine (the context is different in
both cases, so it's OK). It seems the bug is fixed completely. Dominique,
could you check?

----------------------------------------------------------------------

Comment By: Marcin Miłkowski (milek_pl)
Date: 2012-06-18 10:25

Message:
In 1.8-dev I fixed some parts of this, now you get:

<error fromy="0" fromx="15" toy="0" tox="20"
ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word"
replacements="of" context="This is a test of of language tool.  This is is
a test of languag..." contextoffset="15" errorlength="5"/>
<error fromy="2" fromx="5" toy="2" tox="10"
ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word"
replacements="is" context="This is a test of of language tool.  This is is
a test of language tool. " contextoffset="42" errorlength="5"/>

and 

<error fromy="0" fromx="15" toy="0" tox="20"
ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word"
replacements="of" context="This is a test of of language tool.  "
contextoffset="15" errorlength="5"/>
<error fromy="2" fromx="5" toy="2" tox="10"
ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word"
replacements="is" context="This is is a test of language tool. "
contextoffset="5" errorlength="5"/>

Note the tox value: it is now consistent. As far as I can see, fromy and
toy are also correct, but there are outstanding problems with contextoffset
and missing </matches> element.

----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2012-06-17 12:20

Message:
Here is another case was gave wrong column number:

$ (echo "This is"; echo "is an error.") | \
 java -jar LanguageTool.jar -l en --api

It gives:

fromy="1" fromx="5" toy="2" tox="10"

I fixed this one in this SVN checkin:

==========
r7389 | dominikoeo | 2012-06-17 21:06:34 +0200 (Sun, 17 Jun 2012) | 3
lines

- bug #3054895: the column reported by LanguageTool
  was sometimes wrong when error span the new line.
==========


However, this case is still wrong:

$ (echo "An test"; echo "An test") | java -jar
~/sb/languagetool/dist/LanguageTool.jar -l en --api -d PHRASE_REPETITION
<?xml version="1.0" encoding="UTF-8"?>
<matches>
<error fromy="1" fromx="0" toy="1" tox="3" ruleId="EN_A_VS_AN" msg="Use 'A'
instead of 'An' if the following word doesn't start with a vowel sound,
e.g. 'a sentence', 'a university'" replacements="A" context="An test An
test " contextoffset="0" errorlength="2"/>
<error fromy="2" fromx="0" toy="2" tox="2" ruleId="EN_A_VS_AN" msg="Use 'A'
instead of 'An' if the following word doesn't start with a vowel sound,
e.g. 'a sentence', 'a university'" replacements="A" context="An test An
test " contextoffset="8" errorlength="2"/>
<!--
Time: 99ms for 1 sentences (10.1 sentences/sec)
-->

Notice that fist error has  tox="3" and second error has tox="2".
They should be both identical to tox="2".

----------------------------------------------------------------------

Comment By: Dominique Pelle (dominikoeo)
Date: 2012-06-09 13:32

Message:
> Line numbers should be fixed in 1.0.1 but column number are sometimes
still wrong.

Column numbers are still wrong in LanguageTool-1.8.

The following 2 line are enough to reproduce the bug:

$ (echo "An test"; echo "An test") | java -jar
~/sb/languagetool/dist/LanguageTool.jar -l en --api
<?xml version="1.0" encoding="UTF-8"?>
<matches>
<error fromy="1" fromx="0" toy="1" tox="3" ruleId="EN_A_VS_AN" msg="Use 'A'
instead of 'An' if the following word doesn't start with a vowel sound,
e.g. 'a sentence', 'a university'" replacements="A" context="An test An
test " contextoffset="0" errorlength="2"/>
<error fromy="1" fromx="0" toy="2" tox="15" ruleId="PHRASE_REPETITION"
subId="1"  msg="This phrase is duplicated. You should probably leave only
'An test'." replacements="An test" context="An test An test "
contextoffset="0" errorlength="15"/>
<error fromy="2" fromx="0" toy="2" tox="2" ruleId="EN_A_VS_AN" msg="Use 'A'
instead of 'An' if the following word doesn't start with a vowel sound,
e.g. 'a sentence', 'a university'" replacements="A" context="An test An
test " contextoffset="8" errorlength="2"/>
<!--
Time: 107ms for 1 sentences (9.3 sentences/sec)
-->

The 2 lines given to LT are identical.
Yet LT reports different columns for the 2 lines (tox=2 and then tox=3) in
the 2 errors EN_A_VS_AN" 



----------------------------------------------------------------------

Comment By: Daniel Naber (dnaber)
Date: 2010-08-28 07:00

Message:
Line numbers should be fixed in 1.0.1 but column number are sometimes still
wrong.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=655717&aid=3054895&group_id=110216

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-cvs mailing list
Languagetool-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-cvs

Reply via email to