Hi there, > I would like to attach attributes to lemmas. Only a few but maybe there > could be more, so a kind of introducing an attribute name would be nice, > instead of having a predefined set of attribute names.. Lemmas as such aren't represented as such in Apertium dictionaries. They are part of the lexical forms (one could say that the lemma is the material from the beginning of the lexical form up to where the first part-of-speech tag appears. For instance, for surface form "thought" an English dictionary would derive the lexical forms "thought<n><sg>" and "think<vblex>...". The lemmas would then be "thought" and "think". There is a attribute lm="...." in some entries, but it is optional. > I believe there are already lemma attributes, such as the word class of > the lemma: noun, verb, adjective, adverb etc. Not for lemmas. Lemma information is encoded either as the content of the element (see above). Part of speech as well as other morphological information is encoded as attributes of the <s> (symbol element). > what I have in mind is to attach data from wordnet, such as sense, > hypernym, hyponum, holonym, meromnym, and also combine it with the > Swedish SALDO attributes of father and mother relations. > > The idea is then to choose a sense of a homonym based on the shortest > distance to maybe the previous and following five words. > > a lemma may have more than one sense. Eg 'nut' may mean several things > such as the offspring of a plant, nuts and bolts, and testicles. > > Is this easy to do? How do I do it? I think the attribute lm="...." could be stretched a bit to have any value, which could be used to identify the lemma in another structure which could contain all of these (for instance, giving an XPath to another XML file containing all the desired information).
Perhaps it would be better to have some kind of new general purpose attribute that could be used to attach *standoff* information of this kind to any entry <e>. Fran is working on lexical selection and I'm sure his opinion would be interesting to read! All the best Mikel > best regards > keld > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2d-c2 > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
