Dear Corpora subscribers,

I'm pleased to announce the availability of two new corpora of automatic speech 
recognition transcripts from the YouTube channels of municipalities and other 
local government entities:


  *   The Corpus of Australian and New Zealand Spoken English (CoANZSE: 
https://cc.oulu.fi/~scoats/CoANZSE.html), a 196-million-word corpus of 57k 
transcripts from 482 YouTube channels, corresponding to 24k hours of video.

  *    The Corpus of German Speech (CoGS: 
https://cc.oulu.fi/~scoats/CoGS.html): 51m words, 1.3k channels, 39k 
transcripts, 7.2k hours of video.

The corpora were created using methods similar to those used to create the 
Corpus of North American Spoken English 
(https://cc.oulu.fi/~scoats/CoNASE.html) and the Corpus of British Isles Spoken 
English (https://cc.oulu.fi/~scoats/CoBISE.html). Transcript metadata includes 
location and video URL. Because tokens have word timing information, the 
corpora can serve as starting points for the collection of audio or video data 
targeting specific utterances.

The corpora are available free of charge for academic/research purposes. 
Download links are on the web pages.

With kind regards,

Steven Coats
University of Oulu, Finland

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to