Our apologies if you have received multiple copies of this announcement.

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 2 new Speech Desktop/Microphone Resources and 2 new Written Corpora are now available in its catalogue.
*
ELRA-S0345 Spoken Portuguese Corpus
*The Spoken Portuguese corpus consists of a total of 86 recordings (8h44m), collected among sociolinguistically diverse speakers having Portuguese as mother tongue or as second language. The corpus was recorded in a situation of spontaneous oral communication, on different themes of everyday life, with speakers of different ages and social and professional backgrounds. The corpus consists of audio files in .wav format, aligned transcriptions in XML Exmaralda format and transcriptions in plain text. For more information, see: http://catalog.elra.info/product_info.php?products_id=1172

*ELRA-S0346 Fundamental Portuguese Corpus
*The Fundamental Portuguese Corpus is a corpus of spoken language, collected between 1970 and 1974, composed of 1800 recordings (500 hours) made in Continental Portugal and the Islands. Of these 1800 conversations, a sample was selected and transcribed. The corpus consists of audio files in .wav format, aligned transcriptions in XML Exmaralda format and transcriptions in plain text. For more information, see: http://catalog.elra.info/product_info.php?products_id=1173

*ELRA-W0055 CINTIL-TreeBank
*The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus. For more information, see: http://catalog.elra.info/product_info.php?products_id=1174
*
ELRA-W0056 CINTIL-PropBank
*The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) used for regression testing of the computational grammar that supported the annotation of the corpus. For more information, see: http://catalog.elra.info/product_info.php?products_id=1176

For more information on the catalogue, please contact ValĂ©rie Mapelli mailto:[email protected]

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html
_______________________________________________
Mt-list mailing list

Reply via email to