Dear Hugh,

actually, there is nif:lemma in NIF 2.0 (
https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html#d4e751).
It's a datatype property where you just give the string value.

If you need an object property, you could model your lemmas as lexical
entries and use OntoLex. For plain lemmatization, this would be overkill,
but if you want to add metadata about your lemmas or link it with
additional information, say, about derivation information or a proper
dictionary, that would be the most conventional way of doing things in the
context of NIF.

Example:
Peter came home.

Lemma entry:
:come_le a ontolex:Word; ontolex:canonicalForm [ ontolex:writtenRep
"come"@en ].

As for the linking between lemmas and corpus, you could use either Web
Annotation (and annotate the text with lemmas = lexical entries [here]) or
OntoLex-FrAC.

Web Annotation (https://www.w3.org/TR/annotation-model,
https://www.w3.org/ns/oa):

[ a oa:Annotation ] oa:hasBody :come_le; oa:hasTarget [ a oa:RangeSelector;
oa:start "6"; oa:end:"9" ].

OntoLex-FrAC (
https://github.com/ontolex/frequency-attestation-corpus-information):

:come_le frac:attestation [ a frac:Attestation; frac:locus ... ].
(the object of locus could be a WebAnnotation selector or a NIF String URI.

If you'd like to avoid treating lemmas as lexical entries (because they
typically don't have parts of speech, for example), you can also use plain
Web Annotation:

[ a oa:Annotation, my_types:Lemma ] oa:hasBodyValue "come"; oa:hasTarget [
a oa:RangeSelector; oa:start "6"; oa:end:"9" ].

Here, my_types:Lemma is a placeholder for whatever class you introduce to
define lemmas.

I would recommend the following preference
1. nif:lemma if you just have a lemma string and work with NIF anyway
2. OntoLex with frac:attestation if you can cast your lemmas as a lexical
entry
3. WebAnnotation with oa:hasBodyValue if you have strong opinions on your
lemmas being something else than lexical entries.

Note that you are free to use NIF Strings in place of WebAnnotation
selectors (which safes you 3 triples), but people in Web Annotation would
probably prefer their established ways.

Best,
Christian

PS: I won't debate the use and future of lemmatization here. Lemmas won't
disappear from linguistics, language teaching or philological practice,
regardless of what happens in NLP.


Am Di., 10. Okt. 2023 um 00:15 Uhr schrieb Hugh Paterson III via Corpora <
[email protected]>:

> Greetings,
>
> I am working on a project which is using lemmatization. I'm wondering how
> people have approached combining NIF and lemmatization. are there any
> "blessed" extensions or ontologies?
> I'm not seeing nif:lemma as an option within the nif ontology... though I
> am likely missing something.
>
> Kind regards,
> - Hugh
> _______________________________________________
> Corpora mailing list -- [email protected]
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to