Re: POS tag UNKNOWN for SENT_END?

Marcin Miłkowski Wed, 14 Aug 2013 09:21:16 -0700

Hi again,

actually I think we should remove AnalyzedGermanToken - this is the only 
case when AT is not used. But there's an important functionality 
implemented for AGT, namely analysis of features in the tags. This 
should be made general and for all (tagged) languages, so that it would 
be usable in XML rules as well via features of tokens. This way, regexes 
over POS tags (slow) would be replaced with very cheap operations on 
string features, and it would be easier to read rules.


I suggest that we should have a simple config file for a tagset that 
defines possible attributes and their attribute values, such as:

case = nom acc gen dat voc
num = pl sg

etc. Then one will be able to use <token feature='case' value='nom'/> in 
xml, and elsewhere. Of course, we'd need to have some code that parses 
the POS tags to get the attributes. In most cases, we simply need to 
define a simple delimiter (spaces or colons in most cases, sometimes 
empty string) and chop the POS string.

What do you think?

Regards,
Marcin

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: POS tag UNKNOWN for SENT_END?

Reply via email to