El dt 05 de 07 de 2011 a les 16:49 +0200, en/na Mikel Forcada va
escriure:
> Hi there,
> > I would like to attach attributes to lemmas. Only a few but maybe there
> > could be more, so a kind of introducing an attribute name would be nice,
> > instead of having a predefined set of attribute names..
> Lemmas as such aren't represented as such in Apertium dictionaries. They
> are part of the lexical forms (one could say that the lemma is the
> material from the beginning of the lexical form up to where the first
> part-of-speech tag appears. For instance, for surface form "thought" an
> English dictionary would derive the lexical forms "thought<n><sg>" and
> "think<vblex>...". The lemmas would then be "thought" and "think". There
> is a attribute lm="...." in some entries, but it is optional.
> > I believe there are already lemma attributes, such as the word class of
> > the lemma: noun, verb, adjective, adverb etc.
> Not for lemmas. Lemma information is encoded either as the content of
> the element (see above). Part of speech as well as other morphological
> information is encoded as attributes of the <s> (symbol element).
> > what I have in mind is to attach data from wordnet, such as sense,
> > hypernym, hyponum, holonym, meromnym, and also combine it with the
> > Swedish SALDO attributes of father and mother relations.
> >
> > The idea is then to choose a sense of a homonym based on the shortest
> > distance to maybe the previous and following five words.
Which language pair(s) are you working with ? Is it really necessary ?
> > a lemma may have more than one sense. Eg 'nut' may mean several things
> > such as the offspring of a plant, nuts and bolts, and testicles.
> >
> > Is this easy to do? How do I do it?
> I think the attribute lm="...." could be stretched a bit to have any
> value, which could be used to identify the lemma in another structure
> which could contain all of these (for instance, giving an XPath to
> another XML file containing all the desired information).
>
> Perhaps it would be better to have some kind of new general purpose
> attribute that could be used to attach *standoff* information of this
> kind to any entry <e>.
I think that might be nice... also for, for example verb valency or
other features that we don't necessarily want to represent with tags.
> Fran is working on lexical selection and I'm sure his opinion would be
> interesting to read!
Could also use the attribute 'c' for comment.
in the instance that a word has the same lemma/pos/gender and different
paradigms/declensions, I use a pseudo lemma, for example from Russian:
<e lm="язык"><i>язык</i><par n="долг__n_m_nn"/></e>
<e lm="язык"><p><l>язык</l><r>язык:1</r></p><par
n="маг__n_m_aa"/></e>
Fran
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff