W dniu 2014-03-25 09:35, Daniel Naber pisze:
> Hi,
>
> I've written an overview of how we could use readable POS tags in LT:
>
> http://wiki.languagetool.org/readable-part-of-speech-tags
>
> The core part however - how do these new POS tags actually look like -
> is still missing. Any input on the overview and ideas about that core
> part is welcome.

I think that we should not change the existing POS tags, as there are 
features in some languages that are not found in others. For example, 
Polish verbs have perfective or imperfective aspect, and there are also 
reflexive verbs and partially nonreflexive verbs, special agglutinates 
etc. I don't think it will be easy to find a superset of all possible 
features needed, even by using ISOcat, also because I introduced some 
helper POS tags for rules myself. For this reason, I think that values 
and keys should be configurable per tagset. Also, if someone uses a 
feature for Polish in grammar file for English, it should be 
automatically disallowed. We can enforce this configurability using XML 
namespaces.

At the same time, I don't think that mapping is ever required. What we 
need is one-time parsing of POS tags into key-value pairs, which 
basically boils down to storing some values in class members. Then a 
standard getter would be enough, and that is really computationally cheap.

Let me give an example. This is a POS tag for a preposition that 
requires accusative:

prep:acc

We would store the following values:

pos = preposition

case = accusative

A slightly more complex problem is that for some tags, we get 
alternative readings (due to syntactic ambiguity), so we might have a an 
adjective that shares its form in accusative and nominative. One very 
easy way to deal with this is to use simple binary operations on 
constants (think of cases as bit flags). So a very easy binary operation 
would be enough to check whether the values are there.

For all I can see, no HashMaps are required at all, just a consistent 
way of understanding the values in class members.

Regards,
Marcin

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to