Attribute-value pairs for POS tags [Was: Re: German tests]

Marcin Miłkowski Wed, 04 Mar 2015 09:58:52 -0800

W dniu 2015-03-04 o 13:42, Daniel Naber pisze:
> On 2015-03-04 08:52, Marcin Miłkowski wrote:
>
>> If we could move the first part of the code to another class, which
>> would analyze POS tags to get proper values of attributes, the code
>> would be cleaner and faster. The basic attribute-value class could
>> contain several default attributes (they probably need to be
>> addressable
>> by Strings to make them easily extended by subclasses for new languages
>> and new tagsets), such as number, case, gender, and tense. Not all
>> languages need to have such attribute values in their tagsets, but they
>> need to implement a POS tag analyzer if they want to use these
>> attributes.
>
> I'm not sure I understand this last part: does it mean we would just
> keep the old code as long as there are languages that don't have
> switched to the new attribute values?


I don't want to switch. I think it may still be useful to use full POS 
tags and regexes over POS tags, for example in English with its largely 
non-positional Penn tagset. Moreover, there will always be languages 
without a POS tagger. They don't need to implement the new 
attribute-value interface at all. I see this as an additional member 
(precomputed from the POS tag string value) with its own getters and 
setters. If you don't implement this for a language, then the getter 
will simply return an empty map for all tokens, and that's it.

>
> Also, while I like the idea, it looks similar to what I tried in branch
> readable-pos-tags but had to give up as it became too much:
> http://www.mail-archive.com/search?l=languagetool-devel@lists.sourceforge.net&q=subject:%22readable+POS+tags%22&o=newest&f=1
> The focus in that branch was on having the attributes also in the XML
> files, but other than that your approach is similar, isn't it?

Yes, the idea is similar. I think we could approach this step by step to 
avoid changing too much:

- first encapsulate the attribute-setting code in the Unifier class,
- move it to a separate class,
- reuse that code to the AnalyzedToken class,
- remove code for equivalences from XML files and replace that with 
language-dependent Java classes that would implement appropriate 
attribute-value classes stored as members in the AnalyzedToken.

I think the easiest way would be to subclass a generic POS tag analyzer 
class.

What was the biggest issue in your branch aside the complexity?

Regards,
Marcin

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Attribute-value pairs for POS tags [Was: Re: German tests]

Reply via email to