Hi, I've just committed a German rule for subject verb agreement and I'm posting it here because it uses an approach that might be useful for other languages, too.
You can find the documentation at http://wiki.languagetool.org/german-agreement-check The most interesting part is probably the chunker, i.e. the detection of phrases. I tried OpenNLP with its stochastic chunker and it worked quite well, but it finds small chunks, not complex ones. For agreement check, we need complex chunks like "das große Haus und der Garten": "das große Haus" is one chunk, "der Garten" is another chunk, together they are one complex chunk. So on top of OpenNLP, rules are needed to find these complex chunks. It turned out that when you use rules to detect complex chunks, you can as well try to replace the OpenNLP chunker completely with some more rules. This avoids LT getting larger by another 10MB (the size of the models used by OpenNLP). The rules are expressed in OpenRegex syntax, which is similar to what LT does in its patterns, but it's very compact. You can look at some patterns here: https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/de/src/main/java/org/languagetool/chunking/GermanChunker.java#L94 Unlike LT, this is also a real regular expression syntax, i.e. you can use operators like *, +, and ? with the semantics from regular expressions and you can nest expressions with parenthesis. Currently, this is a dependency only for German, but if you want to use this in your language to detect chunks or for something else, we could move it to core. Regards Daniel ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel