Dear Alexander,

whithin the Czech AHISTO project we have OCRed about 300,000
pages from Czech medieval sources FONTES related to the Hussite
era.

The current corpus data contain more than 3 million sentences
(84 million tokens) mostly in Old Czech (36 million tokens),
German and Latin. The corpus is available for download at
https://nlp.fi.muni.cz/trac/ahisto/wiki/NerDataset#Corpus

kind regards,
-- 
Ales Horak
Natural Language Processing Centre (NLP Centre)
Faculty of Informatics
Masaryk University
Brno, Czech Republic



Alexander Osherenko via Corpora wrote on Mar 29, 2023:
> Hi,
> 
> I'm looking for digital old church Slavonic resources such as corpora,
> treebanks, wordnets or raw texts. I am aware of the GORAZD: The Old Church
> Slavonic Digital Hub <http://www.gorazd.org/?q=en/node/21> or the TOROT
> treebank at https://universaldependencies.org, but maybe I miss something.
> Thanks, Alexander
> --
> Alexander Osherenko, Dr. rer. nat.
> Research Associate
> Bavarian Academy of Sciences and Humanities <http://badw.de/>
> Profile: Socioware Development <http://www.socioware.de/osherenko_page.html>
> Profile: Humboldt-Universität zu Berlin
> <https://wirsindhumboldt.de/de/VKkZNyFaeu>
> Profile: ResearchGate
> <https://www.researchgate.net/profile/Alexander_Osherenko>
> Channel: Youtube <https://www.youtube.com/user/MrOsherenko>

> _______________________________________________
> Corpora mailing list -- [email protected]
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to [email protected]
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to