Re: Uppercase Sentence Start Rule (bug #185)

Jaume Ortolà i Font Tue, 02 Jul 2013 15:04:53 -0700

Hi,

You can see what has happened in the Wikipedia checks. See the links below.


In some languages, there are false alarms removed: French, Breton and
Catalan. That looks good.
Other languages have added alarms: English, German, Russian, Polish and
Italian. The reason is that these languages had previously a special
 treatment that has been now removed.

The question is what to do with sentences that end with no ending
punctuation mark (.?!...). If we don't require uppercase sentence start in
these sentences, we avoid a lot of false alarms in lists, tables, etc., as
you can see in the Wikipedia check. On the other hand, we can get false
negatives, as in the reported bug, in titles, etc., when (by mistake or
not) there is no punctuation mark at the sentence end.

We can try a midway solution: don't require upper case sentence start when
both the previous and the current sentence have no ending punctuation mark.
This situation is what we can find in a list or a table, and we can surmise
it isn't an accumulation of mistakes.

What do you think? Any ideas?

Regards,
Jaume Ortolà


LanguageTool Nightly Diff Overview 2013-07-02 22:20

   This page lists the results of our automatic nightly testing against a
   fixed Wikipedia corpus with 1000 articles per language.

   Changes 2013-07-01 22:20 to 2013-07-02 22:20
   Version: 2.3-SNAPSHOT (2013-07-02 22:02)
   [1]Changed: en
   [2]Changed: de
   [3]Changed: fr
   [4]Changed: ru
   [5]Changed: br
   [6]Changed: ca
   [7]Changed: pl
   [8]Changed: it

   Total runtime: 2013-07-02 22:20 to 2013-07-02 23:10

References

   1. http://languagetool
.org/regression-tests/20130702/result_en_20130702.html
   2. http://languagetool.org/regression-tests/20130702/result_de
_20130702.html
   3. http://languagetool
.org/regression-tests/20130702/result_fr_20130702.html
   4. http://languagetool.org/regression-tests/20130702/result_ru
_20130702.html
   5. http://languagetool.org/regression-tests/20130702/result_br
_20130702.html
   6. http://languagetool
.org/regression-tests/20130702/result_ca_20130702.html
   7. http://languagetool
.org/regression-tests/20130702/result_pl_20130702.html
   8. http://languagetool
.org/regression-tests/20130702/result_it_20130702.html


2013/7/2 Jaume Ortolà i Font <jaumeort...@gmail.com>

> Hi,
>
> There is a bug report about the behavior of UppercaseSentenceStartRule:
>
> https://sourceforge.net/p/languagetool/bugs/185/
>
> I think that the only situation in which we can safely prevent the rule
> to match is when the previous sentence ends with comma or semicolon. So I
> propose to implement this for all languages.
>
> Perhaps we can do the same when the previous sentence ends with no
> punctuation mark at all. This could be useful for table cells, but
> sometimes there will be ambiguities. I am not sure.
>
> The current implementation looks at the sentence end to decide what to do
> at the start of the same sentence. I think this makes no sense and causes
> false negatives.
>
> I can make some changes and we'll be able to see what happens in the
> wikipedia checks.
>
> Regards,
> Jaume Ortolà
>
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: Uppercase Sentence Start Rule (bug #185)

Reply via email to