[Mt-list] ELRA Language Resources Catalogue - Update

ELRA ELDA Information Tue, 07 May 2019 07:07:40 -0700

[Apologies for multiple postings]

ELRA is happy to announce that 4 new Speech resources, 1 new WrittenCorpus and 1 new Multilingual Lexicon are now available in our catalogue.


*ELRA-S0399 GlobalPhone Multilingual Model Package*

*ISLRN: 204-945-263-927-6<http://www.islrn.org/resources/204-945-263-927-6>*The GlobalPhone Multilingual Model Package contains about 22 hours oftranscribed read speech spoken by native speakers in 22 languages(Arabic, Bulgarian, Chinese-Mandarin, Chinese-Shanghai, Croatian, Czech,French, German, Hausa, Japanese, Korean, Polish, Portuguese (Brazilian),Russian, Spanish (Latin America), Swahili, Swedish, Tamil, Thai,Turkish, Ukrainian, and Vietnamese). The GlobalPhone Multilingual ModelPackage covers about 1 hour of transcribed speech from 10 speakers (5male, 5 female) from each of the above listed 22 languages.For more information,see:**http://catalog.elra.info/en-us/repository/browse/ELRA-S0399/

*
*
*ELRA-S0400 GlobalPhone 2000 Speaker Package*

*ISLRN: 331-592-378-424-7<http://www.islrn.org/resources/331-592-378-424-7>*The GlobalPhone 2000 Speaker Package contains transcribed read speechspoken by 2000 native speakers in 22 languages (Arabic, Bulgarian,Chinese-Mandarin, Chinese-Shanghai, Croatian, Czech, French, German,Hausa, Japanese, Korean, Polish, Portuguese (Brazilian), Russian,Spanish (Latin America), Swahili, Swedish, Tamil, Thai, Turkish,Ukrainian, and Vietnamese). The GlobalPhone 2000 Speaker Package coversabout 9,000 randomly selected utterances read by 2000 native speakers in22 languages, i.e. on average 4.5 utterances corresponding to 40 secondsof speech per speaker amounting to a total of 22 hours of speech.For more information,see:**http://catalog.elra.info/en-us/repository/browse/ELRA-S0400/


*ELRA-S0402 Speaking atlas of the regional languages of France*

*ISLRN: 112-393-061-014-3<http://www.islrn.org/resources/112-393-061-014-3>*The Speaking atlas of the regional languages of France offers the sameAesop’s fable read in French and in a number of varieties of languagesof France. This work, which has a scientific and heritage dimension,consists in highlighting the linguistic diversity of Metropolitan Franceand Overseas Territories, through recordings collected in the field andpresented via an interactive map, with their orthographic transcription.As far as Occitan is concerned, about sixty varieties were collected inGascony, Languedoc, Provence, northern Occitania and the LinguisticCrescent. Varieties of Basque, Breton, Frannian, West Flemish, Alsatian,Corsican, Catalan, Francoprovençal and Oïl language(s) are alsoprovided, as well as about fifty languages in the French Overseas andnon-territorial languages such as Rromani and the French sign language.For more information,see:**http://catalog.elra.info/en-us/repository/browse/ELRA-S0402/


*ELRA-S0403 CLE Pakistan Urdu Speech Corpus*

*ISLRN: 572-070-066-634-8<http://www.islrn.org/resources/572-070-066-634-8>*This corpus consists of phonetically rich Urdu sentences and additionalsentences covering telephone numbers, addresses and personal names. Thisspeech corpus is recorded with a variety of microphone types. Samplingrate of speech files is 16 kHz. Each utterance is stored in a separatefile and is accompanied by its orthographic transcription file in Unicode.For more information, see:http://catalog.elra.info/en-us/repository/browse/ELRA-S0403/

*ELRA-W0128 ECPC Corpus (European Comparable and Parallel Corpora ofParliamentary Speeches Archive) – set 1**ISLRN: 036-939-425-010-1<http://www.islrn.org/resources/036-939-425-010-1>*This corpus is a collection of XML metatextually tagged corporacontaining speeches from European chambers. It is a bilingual,bidirectional corpus written corpus in English and Spanish. This firstset (ECPC_EP-05) consists of (1) a "clean" version in XML of EuropeanParliament's 2005 daily sessions; (2) a POS-tagged version of the 2005daily sessions; and (3) a sentence-based aligned version of 2005 dailysessions. In its raw format, ECPC_EP-05 contains 3,668,476 tokens/words(excluding tagging) in English distributed over 60 utf-8 files and3,993,867 tokens/words (excluding tagging) in Spanish distributed over60 utf-8 files.For more information,see:**http://catalog.elra.info/en-us/repository/browse/ELRA-W0128/

*ELRA-M0051 EnToSSLNE - a Lexicon of Parallel Named Entities fromEnglish to South Slavic Languages**ISLRN: 690-348-503-270-1<http://www.islrn.org/resources/690-348-503-270-1>*This lexicon consists of 26,155 parallel named entities in sevenlanguages: English and six South Slavic ones: Bosnian, Bulgarian,Croatian, Macedonian, Serbian and Slovenian. The lexicon containsmultiword entries which are not strictly named entities, but contain aword which is. Slovenian, Croatian and Bosnian are written in Latinscript, Macedonian and Bulgarian in Cyrillic. Serbian language isspecific since it may come in two scripts (Cyrillic and Latin) and twodialects (ekavica and ijekavica). This lexicon takes Serbian ekavicavariant and its Cyrillic script. The lexicon comes in two formats: csvand xml.For more information, see:http://catalog.elra.info/en-us/repository/browse/ELRA-M0051/

*
*

For more information on the catalogue, please contact Valérie Mapellimailto:[email protected]If you would like to enquire about having your resources distributed byELRA, please do not hesitate to contact us.



Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info

Archives of ELRA Language Resources Catalogue Updates:http://www.elra.info/en/catalogues/language-resources-announcements/

_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

[Mt-list] ELRA Language Resources Catalogue - Update

Reply via email to