Our apologies if you have received multiple copies of this announcement.
Please note that you receive this email because you are or have been a customer or a provider of ELRA Language Resources.

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

We are happy to announce that 3 new Speech Resources, 2 new Speech-related Resources, 3 new Written Corpora and 2 new Monolingual Lexicons are now available in our catalogue.

*ELRA-S0375 GlobalPhone Swahili*
*ISLRN: **200-331-212-512-8 * <http://islrn.org/resources/200-331-212-512-8/> The GlobalPhone Swahili corpus contains 7,728 utterances spoken by 70 speakers. Native speakers of Swahili were asked to read prompted sentences of newspaper articles. The entire collection took place in Nairobi, Kenya. For more information, see: http://catalog.elra.info/product_info.php?products_id=1258

*ELRA-S0376 GlobalPhone Swahili Pronunciation Dictionary*
*ISLRN: **010-360-238-702-2 * <http://islrn.org/resources/010-360-238-702-2/> The GlobalPhone pronunciation dictionaries contain the pronunciations of all word forms found in the transcription data of the GlobalPhone speech & text database. The Swahili dictionary contains 10664 entries. For more information, see: http://catalog.elra.info/product_info.php?products_id=1259

*ELRA-S0377 GlobalPhone Ukrainian*
*ISLRN: **456-398-378-806-1 * <http://islrn.org/resources/456-398-378-806-1/> The GlobalPhone Ukrainian corpus contains 12,814 utterances spoken by 119 speakers. Native speakers of Ukrainian were asked to read prompted sentences of newspaper articles. The entire collection took place in Donezk, Ukraine. For more information, see: http://catalog.elra.info/product_info.php?products_id=1260

*ELRA-S0378 GlobalPhone Ukrainian Pronunciation Dictionary*
*ISLRN: **022-652-862-222-7 * <http://islrn.org/resources/022-652-862-222-7/> The GlobalPhone pronunciation dictionaries contain the pronunciations of all word forms found in the transcription data of the GlobalPhone speech & text database. The Ukrainian dictionary contains 7748 entries/7740 words. For more information, see: http://catalog.elra.info/product_info.php?products_id=1261

*ELRA-S0379 JV_TDM Corpus*
*ISLRN: **371-240-320-910-4* <http://islrn.org/resources/371-240-320-910-4/>
This corpus provides a phonetic annotation of 37 chapters of the original French version of “Around the World in 80 Days” by Jules Verne read by a single speaker. Each chapter has been annotated in a separate .TextGrid file. The total audio size is 6h 41mn 36s with 5h 2mn 41s of speech. The .TextGrid files contain several annotation tiers: phoneme, number of characters, syllable, transcription, PoS, paragraph break, sentence break, prosodic annotations, breathing pauses. For more information, see: http://catalog.elra.info/product_info.php?products_id=1252

*ELRA-W0088 ROMBAC - Romanian balanced corpus*
*ISLRN: **162-192-982-061-0* <http://islrn.org/resources/162-192-982-061-0/>
ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, medicine and biographical data for Romanian literary personalities. The entire corpus counts around 41,000,000 words, including punctuation. The corpus is annotated at paragraph, sentence, constituent group and word levels, and it provides morpho-syntactic information (MSD). It is xml encoded. For more information, see: http://catalog.elra.info/product_info.php?products_id=1253

*ELRA-W0089 NPChunks*
*ISLRN: **412-883-442-173-8* <http://islrn.org/resources/412-883-442-173-8/>
NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were annotated with specific tags. It was automatically PoS-tagged with MBT tagger, and lemmatized with MBLEM, following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. For more information, see: http://catalog.elra.info/product_info.php?products_id=1256

*ELRA-W0090 EUROPARL Corpus Parallel Corpora: Portuguese-English*
*ISLRN: **435-502-922-727-2* <http://islrn.org/resources/435-502-922-727-2/>
The Portuguese-English subpart of the EUROPARL Corpus was extracted from the proceedings of the European Parliament. It contains approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896 tokens of English (translation). It is composed of one text file for the English corpus and two files for the Portuguese version: a text file and an annotated file, containing a PoS tag and a lemma for each token. For more information, see: http://catalog.elra.info/product_info.php?products_id=1257

*ELRA-L0096 MCL - Multifunctional Computational Lexicon of Contemporary Portuguese*
*ISLRN: **489-956-642-755-8* <http://islrn.org/resources/489-956-642-755-8/>
MCL is a 26,443 lemma Frequency Lexicon with 140,315 tokens extracted from CORLEX, a contemporary Portuguese corpus (16,210,438 words). In order to extract the lexicon, all the different lexical forms occurring in the corpus were indexed and subsequently tagged morphosyntactically and lemmatised by PALAVROSO. Each lemma in MCL is followed by morphosyntactic and quantitative information. For more information, see: http://catalog.elra.info/product_info.php?products_id=1254

*ELRA-L0097 LEX-MWE-PT - Word Combination in Portuguese*
*ISLRN: **353-430-176-260-6* <http://islrn.org/resources/353-430-176-260-6/>
LEX-MWE-PT is a lexicon of European Portuguese containing multiword expressions (MWE) extracted from a balanced 50.8M-word written corpus. The lexicon covers 1,198 lemmas (composed of single words from different PoS categories: nouns, adjectives, verbs and adverbs); 12,753 MWE lemmas (which include inflectional variants of the MWE lemmas); and 242,233 concordances of those MWE manually verified. For more information, see: http://catalog.elra.info/product_info.php?products_id=1255


For more information on the catalogue, please contact Valérie Mapelli mailto:[email protected]

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to