Daniel Naber <daniel.na...@languagetool.org> wrote: > Hi, > > I'm looking for ideas to systematically improve LT's error coverage. The > last years, I've mostly worked by simply adding rules for errors that I > coincidentally found on the web or in emails. Is anybody of you working > systematically in the sense that they have a list of grammar errors for > which they develop rules?
I have no such list. I get ideas at random mostly :-) It's easy as a language maintainers to see false positives, but it's less easy to find false negatives i.e. errors not yet found by LanguageTool. One way to improve coverage would be to compare LT with other grammar checkers. Most of them are commercial grammar checkers. I don't have any of them, but some users do have other grammar checkers I suppose (MS Word, etc.) Another way, would be to make it easier for users to submit sentences with errors not yet detected by LT. It should be easier for users to suggest such errors. At the moment, users can submit issues, but it seems too heavy weight for suggesting simple ideas of error detection. I see that for Grammalecte (a French grammar checker), users simply post in the forum in a dedicated thread ideas about errors that are not yet detected. See: http://dicollecte.org/thread.php?prj=fr&t=167 So we could do something similar for LanguageTool: create dedicated a thread for each language in the forum at https://languagetool.org/forum > Here's one idea, but I'm looking for other approaches as well: We could > develop a categorization of errors and once we consider it mostly > complete, we add examples for all languages and whether the errors are > detected. There are several uses for such a categorization: > -for languages with a lot of rules: find out what still needs to be done > -for new languages: offer a way to add new rules more systematically > -for users: document coverage > > Here's an example how the "Grammar" part might look like: > > Grammar > agreement > noun phrase agreement (a bicycles; two bicycle) > verb phrase agreement (he walk; I walks) > word order > missing word > missing article > ... > superfluous word > word repetition > ... > wrong comparative (most oldest) > wrong preposition (at Thursday vs. on Thursday) > > What do you think, would that be a viable approach? Do you have other > ideas? I'm not sure that would work well across languages. The kind of errors to detect are too language specific in my opinion. But a few rules certainly would apply to several languages with just simple translations or minor changes. The rule about detecting the error *Linux Torvalds* vs "Linus Torvalds" (same for many other famous names often mispelled) is a good example. We can come up with a convention for annotating such rules, so developers can glance at rules in other languages, recognize such rules and adapt them in other languages. Regards Dominique ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel