Re: improving LT coverage

Dominique Pellé Wed, 08 Oct 2014 11:57:49 -0700

Daniel Naber <daniel.na...@languagetool.org> wrote:

> Hi,
>
> I'm looking for ideas to systematically improve LT's error coverage. The
> last years, I've mostly worked by simply adding rules for errors that I
> coincidentally found on the web or in emails. Is anybody of you working
> systematically in the sense that they have a list of grammar errors for
> which they develop rules?


I have no such list.  I get ideas at random mostly :-)
It's easy as a language maintainers to see false positives,
but it's less easy to find false negatives i.e. errors not yet
found by LanguageTool.

One way to improve coverage would be to compare LT
with other grammar checkers. Most of them are commercial
grammar checkers.  I don't have any of them, but some users
do have other grammar checkers I suppose (MS Word, etc.)

Another way, would be to make it easier for users to submit
sentences with errors not yet detected by LT.
It should be easier for users to suggest such errors.
At the moment, users can submit issues, but it seems
too heavy weight for suggesting simple ideas of error
detection. I see that for Grammalecte (a French grammar
checker), users simply post in the forum in a dedicated thread
ideas about errors that are not yet detected.  See:

http://dicollecte.org/thread.php?prj=fr&t=167

So we could do something similar for LanguageTool:
create dedicated a thread for each language in the forum at
https://languagetool.org/forum

> Here's one idea, but I'm looking for other approaches as well: We could
> develop a categorization of errors and once we consider it mostly
> complete, we add examples for all languages and whether the errors are
> detected. There are several uses for such a categorization:
> -for languages with a lot of rules: find out what still needs to be done
> -for new languages: offer a way to add new rules more systematically
> -for users: document coverage
>
> Here's an example how the "Grammar" part might look like:
>
> Grammar
>         agreement
>                 noun phrase agreement (a bicycles; two bicycle)
>                 verb phrase agreement (he walk; I walks)
>         word order
>         missing word
>                 missing article
>                 ...
>         superfluous word
>                 word repetition
>                 ...
>         wrong comparative (most oldest)
>         wrong preposition (at Thursday vs. on Thursday)
>
> What do you think, would that be a viable approach? Do you have other
> ideas?

I'm not sure that would work well across languages.
The kind of errors to detect are too language specific in
my opinion. But a few rules certainly would apply to
several languages with just simple translations
or minor changes.  The rule about detecting the error
*Linux Torvalds* vs "Linus Torvalds" (same for many
other famous names often mispelled) is a good example.
We can come up with a convention for annotating such
rules, so developers can glance at rules in other
languages, recognize such rules and adapt them in
other languages.

Regards
Dominique

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: improving LT coverage

Reply via email to