[Corpora-List] August 2023 Newsletter - LDC

Penn LDC via Corpora Tue, 15 Aug 2023 08:52:48 -0700

In this newsletter:
LDC at Interspeech 2023
LDC releases speech activity detector
Fall 2023 LDC Data Scholarship Program


New publications:
2019 OpenSAT Public Safety Communications 
Simulation<https://catalog.ldc.upenn.edu/LDC2023S06>
Samrómur Queries Icelandic Speech 1.0<https://catalog.ldc.upenn.edu/LDC2023S05>
________________________________
LDC at Interspeech 2023
LDC is happy to be back in person as an exhibitor and longtime supporter of 
Interspeech, taking place this year August 20-24 in Dublin, Ireland. Stop by 
Stand A2 to say hello and learn about the latest developments at the 
Consortium. LDC is also delighted to once again be a silver sponsor for the 
Young Female Researchers in Speech 
Workshop<https://sites.google.com/view/yfrsw-2023> and to provide data in 
support of the CHiME-7 
challenge<https://www.chimechallenge.org/current/workshop/index> satellite 
workshop and the MERLIon CCS 
Challenge<https://sites.google.com/view/merlion-ccs-challenge>.

LDC will post conference updates via our social media platforms. We look 
forward to seeing you in Dublin!

LDC releases speech activity detector
LDC announces the release of the LDC Broad Phonetic Class Speech Activity 
Detector. Based on the broad phonetic class recognizer implemented in the HTK 
Speech Recognition Toolkit<https://htk.eng.cam.ac.uk/>, LDC's speech activity 
detector model runs the speech signal through a GMM-HMM recognizer to identify 
five broad phonetic classes: vowel, stops/affricate, fricative, nasal, and 
glide/liquid. The LDC Broad Phonetic Class Speech Activity Detector is 
available at no cost on 
github<https://github.com/Linguistic-Data-Consortium/ldc-bpcsad> under a GPL v3 
license<https://www.gnu.org/licenses/gpl-3.0.en.html>.

Fall 2023 LDC Data Scholarship Program
Student applications for the Fall 2023 LDC Data Scholarship program are being 
accepted now through September 15, 2023. This program provides eligible 
students with no-cost access to LDC data. Students must complete an application 
consisting of a data use proposal and letter of support from their advisor. For 
application requirements and program rules, visit the LDC Data Scholarships 
page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>
________________________________
New publications:
2019 OpenSAT Public Safety Communications 
Simulation<https://catalog.ldc.upenn.edu/LDC2023S06> contains 141 hours of 
English speech recordings and transcripts used in the NIST Open Speech Analytic 
Technologies (OpenSAT<https://www.nist.gov/itl/iad/mig/opensat>) 2019 
evaluation's automatic speech recognition, speech activity detection, and 
keyword search tasks. The data is part of the SAFE-T (Speech Analysis For 
Emergency Response Technology) corpus created by LDC which is comprised of 
speakers engaged in a collaborative problem-solving activity representative of 
public safety communications in terms of speech content, noise types, and noise 
levels.

US English speakers played the board game Flash Point Fire Rescue. Background 
noise was played through a participant's headset during the recording session. 
Recording sessions consisted of 2 30-minute games. The corpus is divided into 
training, development, and evaluation data.

2023 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

*

Samrómur Queries Icelandic Speech 1.0<https://catalog.ldc.upenn.edu/LDC2023S05> 
was developed by the Language and Voice Lab, Reykjavik 
University<https://lvl.ru.is/> in cooperation with Almannarómur, Center for 
Language Technology<https://almannaromur.is/>. The corpus contains 20 hours of 
Icelandic prompted queries from 3,809 speakers representing 17,475 utterances.

Speech data was collected between October 2019 and December 2021 using the 
Samrómur website<https://samromur.is> which displayed prompts to participants. 
The prompts were mainly from The Icelandic Gigaword 
Corpus<http://clarin.is/en/resources/gigaword>, which includes text from 
novels, news, plays, and from a list of location names in Iceland. Additional 
prompts were taken from the Icelandic Web of 
Science<https://www.visindavefur.is/> and others were created by combining a 
name followed by a question or a demand. Prompts and speaker metadata are 
included in the corpus.

2023 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.

Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] August 2023 Newsletter - LDC

Reply via email to