Related to "The Bridge" is my own Greek Learner Texts Project https://greek-learner-texts.org which relies heavily on lemmatization for building vocabulary lists.
At the Perseus Digital Library https://scaife.perseus.org , we also make extensive use of lemmatization of texts to link to dictionaries, etc. James On Tue, Oct 10, 2023 at 1:11 PM Hugh Paterson III via Corpora < [email protected]> wrote: > Hi Ada good to hear from you, > > The project is called: "The Bridge". https://bridge.haverford.edu/ > I am not the PI. The project has been in existence for about 12 years. > I was invited to become involved through my Drexel LEADING Fellowship. > Here is a paper we published this summer: > https://hughandbecky.us/Hugh-CV/publication/2023-bridging-corpora/4LR_pre_print.pdf > > The Bridge is a linked data application supporting curriculum development. > It was developed with Latin in mind, but has been extended to Greek as > well. It quickly helps instructors and students find new vocabulary words > in newly assigned texts, based on texts they have already encountered in > their curriculum. > > The current workflow takes a variety of texts from several sources and > then stores the lemmas for comparison across texts and broad stats > generation. I see value in modeling the whole text not just the lemmas as > this may allow future services. So, while NIF could model the whole text, > the current operational activities really only involve using lemmas. To > move forward in a linked data model we need to support current operations. > More broadly, I see the lemmas as an "annotation" or abstraction layer > whereas I would see the actual content of texts as the "source data". Using > linked data and lemmas allows the bridge to connect via lemmas to LiLa > data. https://lila-erc.eu/ > > Kind regards, > Hugh > > > > On Tue, Oct 10, 2023 at 3:39 AM Ada Wan <[email protected]> wrote: > >> Dear Hugh >> >> What project are you working on that still requires lemmatization? Would >> it not be a better approach to use (sub-)character n-grams (esp. if you are >> doing textual analysis/interpretation, vs. processing which can be >> byte-based) to decipher what segments would occur most frequently first and >> (re-)analyze from there? >> I understand there has been a habit in the "language space" to call >> certain segments "lemmata". I am curious to know what one can do as a >> community, though, to transition to more general methods (and >> interpretations on "language"). >> >> Thanks and best >> Ada >> >> >> On Tue, Oct 10, 2023 at 12:15 AM Hugh Paterson III via Corpora < >> [email protected]> wrote: >> >>> Greetings, >>> >>> I am working on a project which is using lemmatization. I'm wondering >>> how people have approached combining NIF and lemmatization. are there any >>> "blessed" extensions or ontologies? >>> I'm not seeing nif:lemma as an option within the nif ontology... though >>> I am likely missing something. >>> >>> Kind regards, >>> - Hugh >>> _______________________________________________ >>> Corpora mailing list -- [email protected] >>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>> To unsubscribe send an email to [email protected] >>> >> _______________________________________________ > Corpora mailing list -- [email protected] > https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ > To unsubscribe send an email to [email protected] >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
