Dear all,

We are happy to announce that the LASLA Latin corpus has been published Open 
Access under a CC-BY-NC-SA 4.0 license. The portion of the LASLA corpus 
published comprises ca 1.7 million tokens of works from the Classical period, 
manually annotated with the following information: lemma, Part-of-Speech, 
morphological features, partial syntactic information, and metadata.  The LASLA 
has ongoing annotation projects, whose results will be uploaded to the 
Dataverses when they are finalised. We hope to provide a service to the 
community focusing on Latin linguistics and Latin literary studies, as well as 
to serve the most recent NLP trends.

The corpus can be accessed in three Dataverses, each containing one specific 
format. We recommend using the “Tree View” to have an idea of what files can be 
found in the Dataverse.


  *   DAT and APN (resp. https://doi.org/10.58119/ULG/27VZID  and 
https://doi.org/10.58119/ULG/QJJ0SA) are published with detailed documentation 
on the codes used and all the annotation choices implemented by the LASLA 
across the years. We hope that such documentation can support an optimal 
exploitation of the data by external researchers.
  *   BPN files (https://doi.org/10.58119/ULG/49UQNU), which were previously 
shared with Data Transfer Agreements with external partners. Beyond 
documentation purposes, this  Dataverse also provides the original version on 
which the CoNLL-U format was based (see below)

The LASLA files can be exploited via (free) online interfaces: Opera 
Latina<http://cipl93.philo.ulg.ac.be/OperaLatina/> (for which an account can be 
requested by contacting Lauren Simon, email 
[email protected]<mailto:[email protected]>), which enables structured searches 
through the files; HyperbaseWeb<http://hyperbase.unice.fr/hyperbase/> (Latin 
bases), for which you find documentation 
here<https://margheritafantoli.wordpress.com/2021/04/22/having-fun-with-hyperbaseweb-and-the-english-royal-family/>
 and 
here<https://margheritafantoli.wordpress.com/2021/04/22/having-fun-with-hyperbaseweb-and-the-english-royal-family-ii/>,
 and that does not require an account. HyperbaseWeb allows complex statistical 
queries to be carried out.

Following the Data Transfer Agreement for BPNs, an intense collaboration with 
the LiLa ERC project<https://lila-erc.eu/> started. The output of this 
collaboration is the following:


  *   The LASLA corpus is linked to the LiLa Knowledge Base and can be queried, 
jointly with all the other resources linked, via the LiLa Interactive Search 
Platform<https://lila-erc.eu/LiLaLisp/> and SPARQL<https://lila-erc.eu/sparql/> 
endpoint. The triples of the linking are published openly here.
  *   The LiLa team has converted the BPN files into CoNLL-U files, enriching 
the annotation with the URIs of tokens and lemmas as they are found in the LiLa 
Knowledge Base. This version of the corpus can be found on 
Zenodo<https://doi.org/10.5281/zenodo.5961377> and 
Github<https://github.com/CIRCSE/LASLA>.

We hope that this collaboration will trigger many others, with other partners 
enriching and providing new exploitation pathways for the LASLA corpus.

For the moment, have fun!

With kind regards,

The LASLA and LiLa teams


Prof. Marco C. Passarotti
Computational Linguistics
Index Thomisticus Treebank https://itreebank.marginalia.it/
ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994)
CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html

[cid:38DBA4B0-3169-48DD-B59A-4F3A679F9DD9@lan]   
[cid:D415BF3A-E244-4BC4-9FB5-064066B300AD@lan]  
[cid:13BA173A-59CB-4F2D-9B90-DE302E870A50@lan]

[http://static.unicatt.it/ext-portale/5xmille_firma_mail_2023.jpg] 
<https://www.unicatt.it/uc/5xmille>

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to