[Apologies for multiple postings]
We are happy to announce that 2 new Written Corpora and 4 new Speech
resources are now available in our catalogue.
*ELRA-W0126 Training and test data for Arabizi detection and
transliteration*
*ISLRN: 986-364-744-303-9
<http://www.islrn.org/resources/986-364-744-303-9>*
The dataset is composed of : a collection of mixed English and Arabizi
text intended to train and test a system for the automatic detection of
code-switching in mixed English and Arabizi texts ; and a set of 3,452
Arabizi tokens manually transliterated into Arabic, intended to train
and test a system that performs Arabizi to Arabic transliteration.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-W0126/
*ELRA-W0127 Normalized Arabic Fragments for Inestimable Stemming (NAFIS)*
*ISLRN: 305-450-745-774-1
<http://www.islrn.org/resources/305-450-745-774-1>*
This is an Arabic stemming gold standard corpus composed by a collection
of 37 sentences, selected to be representative of Arabic stemming tasks
and manually annotated. Compiled sentences belong to various sources
(poems, holy Quran, books, and periodics) of diversified kinds (proverb
and dictum, article commentary, religious text, literature, historical
fiction). NAFIS is represented according to the TEI standard.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-W0127/
*
ELRA-S0396 Mbochi speech corpus*
*ISLRN: 747-055-093-447-8
<http://www.islrn.org/resources/747-055-093-447-8>*
This corpus consists of 5131 sentences recorded in Mbochi, together with
their transcription and French translation, as well as the results from
the work made during JSALT workshop: alignments at the phonetic level
and various results of unsupervised word segmentation from audio. The
audio corpus is made up of 4,5 hours, downsampled at 16kHz, 16bits, with
Linear PCM encoding. Data is distributed into 2 parts, one for training
consisting of 4617 sentences, and one for development consisting of 514
sentences.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-S0396/
*
ELRA-S0397 Chinese Mandarin (South) database*
*ISLRN: 503-886-852-083-2
<http://www.islrn.org/resources/503-886-852-083-2>*
This database contains the recordings of 1000 Chinese Mandarin speakers
from Southern China (500 males and 500 females), from 18 to 60 years’
old, recorded in quiet studios. Recordings were made through microphone
headsets and consist of 341 hours of audio data (about 30 minutes per
speaker), stored in .WAV files as sequences of 48 KHz Mono, 16 bits,
Linear PCM.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-S0397/
*ELRA-S0398 Chinese Mandarin (North) database*
*ISLRN: 353-548-770-894-7
<http://www.islrn.org/resources/353-548-770-894-7>*
This database contains the recordings of 500 Chinese Mandarin speakers
from Northern China (250 males and 250 females), from 18 to 60 years’
old, recorded in quiet studios. Recordings were made through microphone
headsets and consist of 172 hours of audio data (about 30 minutes per
speaker), stored in .WAV files as sequences of 48 KHz Mono, 16 bits,
Linear PCM.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-S0398/
*ELRA-S0401 Persian Audio Dictionary*
*ISLRN: 133-181-128-420-9
<http://www.islrn.org/resources/133-181-128-420-9>*
This dictionary consists of more than 50,000 entries (along with almost
all wordforms and proper names) with corresponding audio files in MP3
and English transliterations. The words have been recorded with standard
Persian (Farsi) pronunciation (all by a single speaker). This dictionary
is provided with its software.
For more information, see:
http://catalog.elra.info/en-us/repository/browse/ELRA-S0401/
For more information on the catalogue, please contact Valérie Mapelli
mailto:[email protected]
If you would like to enquire about having your resources distributed by
ELRA, please do not hesitate to contact us.
Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates:
http://www.elra.info/en/catalogues/language-resources-announcements/
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list