[Mt-list] ELRA - Language Resources Catalogue - Update

ELRA ELDA Information Thu, 13 Sep 2012 08:46:26 -0700

Our apologies if you have received multiple copies of this announcement.


*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 2 new Speech Desktop/Microphone Resourcesand 2 new Written Corpora are now available in its catalogue.

*
ELRA-S0345 Spoken Portuguese Corpus

*The Spoken Portuguese corpus consists of a total of 86 recordings(8h44m), collected among sociolinguistically diverse speakers havingPortuguese as mother tongue or as second language. The corpus wasrecorded in a situation of spontaneous oral communication, on differentthemes of everyday life, with speakers of different ages and social andprofessional backgrounds. The corpus consists of audio files in .wavformat, aligned transcriptions in XML Exmaralda format andtranscriptions in plain text.For more information, see:http://catalog.elra.info/product_info.php?products_id=1172


*ELRA-S0346 Fundamental Portuguese Corpus

*The Fundamental Portuguese Corpus is a corpus of spoken language,collected between 1970 and 1974, composed of 1800 recordings (500 hours)made in Continental Portugal and the Islands. Of these 1800conversations, a sample was selected and transcribed. The corpusconsists of audio files in .wav format, aligned transcriptions in XMLExmaralda format and transcriptions in plain text.For more information, see:http://catalog.elra.info/product_info.php?products_id=1173


*ELRA-W0055 CINTIL-TreeBank

*The CINTIL-TreeBank is a corpus of syntactic constituency trees ofPortuguese texts composed of 10,039 sentences and 110,166 tokens takenfrom different sources and domains: news (8,861 sentences; 101,430tokens), novels (399 sentences; 3,082 tokens). In addition, there are779 sentences (5,654 tokens) that are used for regression testing of thecomputational grammar that supported the annotation of the corpus.For more information, see:http://catalog.elra.info/product_info.php?products_id=1174

*
ELRA-W0056 CINTIL-PropBank

*The CINTIL-PropBank is a corpus of sentences annotated with theirconstituency structure and semantic role tags, composed of 10,039sentences and 110,166 tokens taken from different sources and domains:news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082tokens). In addition, there are 779 sentences (5,654 tokens) used forregression testing of the computational grammar that supported theannotation of the corpus.For more information, see:http://catalog.elra.info/product_info.php?products_id=1176

For more information on the catalogue, please contact Valérie Mapellimailto:[email protected]


Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info

Archives of ELRA Language Resources Catalogue Updates:http://www.elra.info/LRs-Announcements.html

_______________________________________________
Mt-list mailing list

[Mt-list] ELRA - Language Resources Catalogue - Update

Reply via email to