Dear Alexander, whithin the Czech AHISTO project we have OCRed about 300,000 pages from Czech medieval sources FONTES related to the Hussite era.
The current corpus data contain more than 3 million sentences (84 million tokens) mostly in Old Czech (36 million tokens), German and Latin. The corpus is available for download at https://nlp.fi.muni.cz/trac/ahisto/wiki/NerDataset#Corpus kind regards, -- Ales Horak Natural Language Processing Centre (NLP Centre) Faculty of Informatics Masaryk University Brno, Czech Republic Alexander Osherenko via Corpora wrote on Mar 29, 2023: > Hi, > > I'm looking for digital old church Slavonic resources such as corpora, > treebanks, wordnets or raw texts. I am aware of the GORAZD: The Old Church > Slavonic Digital Hub <http://www.gorazd.org/?q=en/node/21> or the TOROT > treebank at https://universaldependencies.org, but maybe I miss something. > Thanks, Alexander > -- > Alexander Osherenko, Dr. rer. nat. > Research Associate > Bavarian Academy of Sciences and Humanities <http://badw.de/> > Profile: Socioware Development <http://www.socioware.de/osherenko_page.html> > Profile: Humboldt-Universität zu Berlin > <https://wirsindhumboldt.de/de/VKkZNyFaeu> > Profile: ResearchGate > <https://www.researchgate.net/profile/Alexander_Osherenko> > Channel: Youtube <https://www.youtube.com/user/MrOsherenko> > _______________________________________________ > Corpora mailing list -- [email protected] > https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ > To unsubscribe send an email to [email protected] _______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
