[Corpora-List] March 2024 Newsletter - LDC

Penn LDC via Corpora Fri, 15 Mar 2024 09:39:49 -0700

In this newsletter:
LDC data and commercial technology development

New publications:
RATS Low Speech Density<https://catalog.ldc.upenn.edu/LDC2024S03>
BabyEars Affective Vocalizations<https://catalog.ldc.upenn.edu/LDC2024S04>


________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite 
for obtaining a commercial license to almost all LDC databases. Non-member 
organizations, including non-member for-profit organizations, cannot use LDC 
data to develop or test products for commercialization, nor can they use LDC 
data in any commercial product or for any commercial purpose. LDC data users 
should consult corpus-specific license agreements for limitations on the use of 
certain corpora. Visit the 
Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for 
further information.

________________________________
New publications:
RATS Low Speech Density<https://catalog.ldc.upenn.edu/LDC2024S03> was developed 
by LDC and is comprised of 87 hours of English, Levantine Arabic, Farsi, 
Pashto, and Urdu speech, and non-speech samples. The recordings were assembled 
by concatenating a randomized selection of speech, communications systems 
sounds, and silence. This corpus was created to measure false alarm performance 
in RATS speech activity detection systems.

The source audio was extracted from RATS development and progress sets and 
consists of conversational telephone speech recordings collected by LDC. 
Non-speech samples were selected from communications systems sounds, including 
telephone network special information tones, radio selective calling signals, 
HF/VHF/UHF digital mode radio traffic, radio network control channel signals, 
two-way radio traffic containing roger beeps, and short duration shift-key 
modulated handset data transmissions.

The goal of the RATS (Robust Automatic Transcription of Speech) program was to 
develop human language technology systems capable of performing speech 
detection, language identification, speaker identification, and keyword 
spotting on the severely degraded audio signals that are typical of various 
radio communication channels, especially those employing various types of 
handheld portable transceiver systems.

2024 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

*

BabyEars Affective Vocalizations<https://catalog.ldc.upenn.edu/LDC2024S04> 
contains 22 minutes of spontaneous English speech by 12 adults interacting with 
their infant children, for a total of 509 infant-directed utterances and 185 
adult-directed or neutral utterances. Speech data was collected in a quiet room 
during a one-hour session where each sparent was asked to play and otherwise 
interact normally with their infant (aged 10-18 months). A trained research 
assistant then extracted discrete utterances and classified them in three 
categories: approval, attention, and prohibition.

2024 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] March 2024 Newsletter - LDC

Reply via email to