[Corpora-List] Re: NIF: NLP Interchange Format

James Tauber via Corpora Tue, 10 Oct 2023 10:45:40 -0700

Related to "The Bridge" is my own Greek Learner Texts Project
https://greek-learner-texts.org which relies heavily on lemmatization for
building vocabulary lists.


At the Perseus Digital Library https://scaife.perseus.org , we also make
extensive use of lemmatization of texts to link to dictionaries, etc.

James

On Tue, Oct 10, 2023 at 1:11 PM Hugh Paterson III via Corpora <
[email protected]> wrote:

> Hi Ada good to hear from you,
>
> The project is called: "The Bridge". https://bridge.haverford.edu/
> I am not the PI. The project has been in existence for about 12 years.
>  I was invited to become involved through my Drexel LEADING Fellowship.
> Here is a paper we published this summer:
> https://hughandbecky.us/Hugh-CV/publication/2023-bridging-corpora/4LR_pre_print.pdf
>
> The Bridge is a linked data application supporting curriculum development.
> It was developed with Latin in mind, but has been extended to Greek as
> well. It quickly helps instructors and students find new vocabulary words
> in newly assigned texts, based on texts they have already encountered in
> their curriculum.
>
> The current workflow takes a variety of texts from several sources and
> then stores the lemmas for comparison across texts and broad stats
> generation. I see value in modeling the whole text not just the lemmas as
> this may allow future services. So, while NIF could model the whole text,
> the current operational activities really only involve using lemmas. To
> move forward in a linked data model we need to support current operations.
> More broadly, I see the lemmas as an "annotation" or abstraction layer
> whereas I would see the actual content of texts as the "source data". Using
> linked data and lemmas allows the bridge to connect via lemmas to LiLa
> data. https://lila-erc.eu/
>
> Kind regards,
> Hugh
>
>
>
> On Tue, Oct 10, 2023 at 3:39 AM Ada Wan <[email protected]> wrote:
>
>> Dear Hugh
>>
>> What project are you working on that still requires lemmatization? Would
>> it not be a better approach to use (sub-)character n-grams (esp. if you are
>> doing textual analysis/interpretation, vs. processing which can be
>> byte-based) to decipher what segments would occur most frequently first and
>> (re-)analyze from there?
>> I understand there has been a habit in the "language space" to call
>> certain segments "lemmata". I am curious to know what one can do as a
>> community, though, to transition to more general methods (and
>> interpretations on "language").
>>
>> Thanks and best
>> Ada
>>
>>
>> On Tue, Oct 10, 2023 at 12:15 AM Hugh Paterson III via Corpora <
>> [email protected]> wrote:
>>
>>> Greetings,
>>>
>>> I am working on a project which is using lemmatization. I'm wondering
>>> how people have approached combining NIF and lemmatization. are there any
>>> "blessed" extensions or ontologies?
>>> I'm not seeing nif:lemma as an option within the nif ontology... though
>>> I am likely missing something.
>>>
>>> Kind regards,
>>> - Hugh
>>> _______________________________________________
>>> Corpora mailing list -- [email protected]
>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>> To unsubscribe send an email to [email protected]
>>>
>> _______________________________________________
> Corpora mailing list -- [email protected]
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to [email protected]
>

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] Re: NIF: NLP Interchange Format

Reply via email to