W dniu 2015-03-04 o 13:42, Daniel Naber pisze: > On 2015-03-04 08:52, Marcin Miłkowski wrote: > >> If we could move the first part of the code to another class, which >> would analyze POS tags to get proper values of attributes, the code >> would be cleaner and faster. The basic attribute-value class could >> contain several default attributes (they probably need to be >> addressable >> by Strings to make them easily extended by subclasses for new languages >> and new tagsets), such as number, case, gender, and tense. Not all >> languages need to have such attribute values in their tagsets, but they >> need to implement a POS tag analyzer if they want to use these >> attributes. > > I'm not sure I understand this last part: does it mean we would just > keep the old code as long as there are languages that don't have > switched to the new attribute values?
I don't want to switch. I think it may still be useful to use full POS tags and regexes over POS tags, for example in English with its largely non-positional Penn tagset. Moreover, there will always be languages without a POS tagger. They don't need to implement the new attribute-value interface at all. I see this as an additional member (precomputed from the POS tag string value) with its own getters and setters. If you don't implement this for a language, then the getter will simply return an empty map for all tokens, and that's it. > > Also, while I like the idea, it looks similar to what I tried in branch > readable-pos-tags but had to give up as it became too much: > http://www.mail-archive.com/search?l=languagetool-devel@lists.sourceforge.net&q=subject:%22readable+POS+tags%22&o=newest&f=1 > The focus in that branch was on having the attributes also in the XML > files, but other than that your approach is similar, isn't it? Yes, the idea is similar. I think we could approach this step by step to avoid changing too much: - first encapsulate the attribute-setting code in the Unifier class, - move it to a separate class, - reuse that code to the AnalyzedToken class, - remove code for equivalences from XML files and replace that with language-dependent Java classes that would implement appropriate attribute-value classes stored as members in the AnalyzedToken. I think the easiest way would be to subclass a generic POS tag analyzer class. What was the biggest issue in your branch aside the complexity? Regards, Marcin ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel