[Apologies for multiple postings]
ELRA is happy to announce that 4 new Speech resources, 1 new Written
Corpus and 1 new Multilingual Lexicon are now available in our catalogue.
*ELRA-S0399 GlobalPhone Multilingual Model Package*
*ISLRN: 204-945-263-927-6
<http://www.islrn.org/resources/204-945-263-927-6>*
The GlobalPhone Multilingual Model Package contains about 22 hours of
transcribed read speech spoken by native speakers in 22 languages
(Arabic, Bulgarian, Chinese-Mandarin, Chinese-Shanghai, Croatian, Czech,
French, German, Hausa, Japanese, Korean, Polish, Portuguese (Brazilian),
Russian, Spanish (Latin America), Swahili, Swedish, Tamil, Thai,
Turkish, Ukrainian, and Vietnamese). The GlobalPhone Multilingual Model
Package covers about 1 hour of transcribed speech from 10 speakers (5
male, 5 female) from each of the above listed 22 languages.
For more information,
see:**http://catalog.elra.info/en-us/repository/browse/ELRA-S0399/
*
*
*ELRA-S0400 GlobalPhone 2000 Speaker Package*
*ISLRN: 331-592-378-424-7
<http://www.islrn.org/resources/331-592-378-424-7>*
The GlobalPhone 2000 Speaker Package contains transcribed read speech
spoken by 2000 native speakers in 22 languages (Arabic, Bulgarian,
Chinese-Mandarin, Chinese-Shanghai, Croatian, Czech, French, German,
Hausa, Japanese, Korean, Polish, Portuguese (Brazilian), Russian,
Spanish (Latin America), Swahili, Swedish, Tamil, Thai, Turkish,
Ukrainian, and Vietnamese). The GlobalPhone 2000 Speaker Package covers
about 9,000 randomly selected utterances read by 2000 native speakers in
22 languages, i.e. on average 4.5 utterances corresponding to 40 seconds
of speech per speaker amounting to a total of 22 hours of speech.
For more information,
see:**http://catalog.elra.info/en-us/repository/browse/ELRA-S0400/
*ELRA-S0402 Speaking atlas of the regional languages of France*
*ISLRN: 112-393-061-014-3
<http://www.islrn.org/resources/112-393-061-014-3>*
The Speaking atlas of the regional languages of France offers the same
Aesop’s fable read in French and in a number of varieties of languages
of France. This work, which has a scientific and heritage dimension,
consists in highlighting the linguistic diversity of Metropolitan France
and Overseas Territories, through recordings collected in the field and
presented via an interactive map, with their orthographic transcription.
As far as Occitan is concerned, about sixty varieties were collected in
Gascony, Languedoc, Provence, northern Occitania and the Linguistic
Crescent. Varieties of Basque, Breton, Frannian, West Flemish, Alsatian,
Corsican, Catalan, Francoprovençal and Oïl language(s) are also
provided, as well as about fifty languages in the French Overseas and
non-territorial languages such as Rromani and the French sign language.
For more information,
see:**http://catalog.elra.info/en-us/repository/browse/ELRA-S0402/
*ELRA-S0403 CLE Pakistan Urdu Speech Corpus*
*ISLRN: 572-070-066-634-8
<http://www.islrn.org/resources/572-070-066-634-8>*
This corpus consists of phonetically rich Urdu sentences and additional
sentences covering telephone numbers, addresses and personal names. This
speech corpus is recorded with a variety of microphone types. Sampling
rate of speech files is 16 kHz. Each utterance is stored in a separate
file and is accompanied by its orthographic transcription file in Unicode.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-S0403/
*ELRA-W0128 ECPC Corpus (European Comparable and Parallel Corpora of
Parliamentary Speeches Archive) – set 1*
*ISLRN: 036-939-425-010-1
<http://www.islrn.org/resources/036-939-425-010-1>*
This corpus is a collection of XML metatextually tagged corpora
containing speeches from European chambers. It is a bilingual,
bidirectional corpus written corpus in English and Spanish. This first
set (ECPC_EP-05) consists of (1) a "clean" version in XML of European
Parliament's 2005 daily sessions; (2) a POS-tagged version of the 2005
daily sessions; and (3) a sentence-based aligned version of 2005 daily
sessions. In its raw format, ECPC_EP-05 contains 3,668,476 tokens/words
(excluding tagging) in English distributed over 60 utf-8 files and
3,993,867 tokens/words (excluding tagging) in Spanish distributed over
60 utf-8 files.
For more information,
see:**http://catalog.elra.info/en-us/repository/browse/ELRA-W0128/
*ELRA-M0051 EnToSSLNE - a Lexicon of Parallel Named Entities from
English to South Slavic Languages*
*ISLRN: 690-348-503-270-1
<http://www.islrn.org/resources/690-348-503-270-1>*
This lexicon consists of 26,155 parallel named entities in seven
languages: English and six South Slavic ones: Bosnian, Bulgarian,
Croatian, Macedonian, Serbian and Slovenian. The lexicon contains
multiword entries which are not strictly named entities, but contain a
word which is. Slovenian, Croatian and Bosnian are written in Latin
script, Macedonian and Bulgarian in Cyrillic. Serbian language is
specific since it may come in two scripts (Cyrillic and Latin) and two
dialects (ekavica and ijekavica). This lexicon takes Serbian ekavica
variant and its Cyrillic script. The lexicon comes in two formats: csv
and xml.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-M0051/
*
*
For more information on the catalogue, please contact Valérie Mapelli
mailto:[email protected]
If you would like to enquire about having your resources distributed by
ELRA, please do not hesitate to contact us.
Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates:
http://www.elra.info/en/catalogues/language-resources-announcements/
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list