In this newsletter:
LDC at IC2S2
Fall 2024 LDC Data Scholarship Program

New publications:
MATERIAL Bulgarian-English Language 
Pack<https://catalog.ldc.upenn.edu/LDC2024S07>
Dialogs Re-Enacted Across Languages<https://catalog.ldc.upenn.edu/LDC2024S08>

________________________________
LDC at IC2S2
LDC is delighted to be a bronze sponsor for the 10th International Conference 
on Computational Social Science (IC2S2<https://ic2s2-2024.org/>) held this year 
on Penn's campus July 17-20. The conference will feature research from around 
the world across a broad range of relevant fields to advance the many frontiers 
of computational social science. Be sure to visit LDC's table during the poster 
sessions July 18 and 19 from 1:30-2:30 pm.

Fall 2024 LDC Data Scholarship Program
Student applications for the Fall 2024 LDC Data Scholarship program are being 
accepted now through September 15, 2024. This program provides eligible 
students with no-cost access to LDC data. Students must complete an application 
consisting of a data use proposal and letter of support from their advisor. For 
application requirements and program rules, visit the LDC Data Scholarships 
page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
________________________________

New publications:
MATERIAL Bulgarian-English Language 
Pack<https://catalog.ldc.upenn.edu/LDC2024S07> was developed by 
Appen<http://www.appen.com/> for the IARPA (Intelligence Advanced Research 
Projects Activity) 
MATERIAL<https://www.iarpa.gov/index.php/research-programs/material> (Machine 
Translation for English Retrieval of Information in Any Language) program. It 
contains 80 hours of Bulgarian conversational telephone speech, transcripts, 
English translations, annotations, and queries.

Calls were made using different telephones (e.g., mobile, landline) from a 
variety of environments. Transcripts cover approximately 40% of the speech 
files, and approximately 10% of the speech files were translated into English. 
This release also includes domain annotations, English queries, and their 
relevance annotations.

The MATERIAL program focused on underserved languages with the ultimate goal to 
build cross language information retrieval systems to find speech and text 
content using English search queries.

2024 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.

*

Dialogs Re-Enacted Across Languages<https://catalog.ldc.upenn.edu/LDC2024S08> 
was developed at the University of Texas at El Paso<https://www.utep.edu/>. It 
contains 17 hours of conversational speech in English and Spanish by 129 unique 
bilingual speakers, specifically, short fragments extracted from spontaneous 
conversations and close re-enactments in the other language by the original 
speakers, for 3816 pairs of matching utterances. Data was collected in 
2022-2023. Participants were recruited from among students at the University of 
Texas at El Paso; all were bilingual speakers of General American English and 
of Mexico-Texas Border Spanish.

Each speaker pair had a 10 minute conversation in one language. Various 
fragments from these conversations were chosen for re-enactment, and the 
original speakers produced equivalents in the other language. Each re-enactment 
was vetted for fidelity to the original and naturalness in the target language. 
Also included is metadata about conversations, participants, re-enactments and 
utterances.

2024 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to