Hi again, actually I think we should remove AnalyzedGermanToken - this is the only case when AT is not used. But there's an important functionality implemented for AGT, namely analysis of features in the tags. This should be made general and for all (tagged) languages, so that it would be usable in XML rules as well via features of tokens. This way, regexes over POS tags (slow) would be replaced with very cheap operations on string features, and it would be easier to read rules.
I suggest that we should have a simple config file for a tagset that defines possible attributes and their attribute values, such as: case = nom acc gen dat voc num = pl sg etc. Then one will be able to use <token feature='case' value='nom'/> in xml, and elsewhere. Of course, we'd need to have some code that parses the POS tags to get the attributes. In most cases, we simply need to define a simple delimiter (spaces or colons in most cases, sometimes empty string) and chop the POS string. What do you think? Regards, Marcin ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
