Dear Sender,

I am currently out of the office and will not be checking emails regularly. I 
will return on September 9, and will respond to your message as soon as 
possible after that date.

Best regards,
Charlott Jakob

On 15 Jul 2024, at 18:14, Penn LDC via Corpora <[email protected]> wrote:

In this newsletter:
LDC at IC2S2
Fall 2024 LDC Data Scholarship Program

New publications:
MATERIAL Bulgarian-English Language Pack
Dialogs Re-Enacted Across Languages



LDC at IC2S2
LDC is delighted to be a bronze sponsor for the 10th International Conference 
on Computational Social Science (IC2S2) held this year on Penn’s campus July 
17-20. The conference will feature research from around the world across a 
broad range of relevant fields to advance the many frontiers of computational 
social science. Be sure to visit LDC’s table during the poster sessions July 18 
and 19 from 1:30-2:30 pm.

Fall 2024 LDC Data Scholarship Program
Student applications for the Fall 2024 LDC Data Scholarship program are being 
accepted now through September 15, 2024. This program provides eligible 
students with no-cost access to LDC data. Students must complete an application 
consisting of a data use proposal and letter of support from their advisor. For 
application requirements and program rules, visit the LDC Data Scholarships 
page.


New publications:
MATERIAL Bulgarian-English Language Pack was developed by Appen for the IARPA 
(Intelligence Advanced Research Projects Activity) MATERIAL (Machine 
Translation for English Retrieval of Information in Any Language) program. It 
contains 80 hours of Bulgarian conversational telephone speech, transcripts, 
English translations, annotations, and queries.

Calls were made using different telephones (e.g., mobile, landline) from a 
variety of environments. Transcripts cover approximately 40% of the speech 
files, and approximately 10% of the speech files were translated into English. 
This release also includes domain annotations, English queries, and their 
relevance annotations.

The MATERIAL program focused on underserved languages with the ultimate goal to 
build cross language information retrieval systems to find speech and text 
content using English search queries.

2024 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.


*


Dialogs Re-Enacted Across Languages was developed at the University of Texas at 
El Paso. It contains 17 hours of conversational speech in English and Spanish 
by 129 unique bilingual speakers, specifically, short fragments extracted from 
spontaneous conversations and close re-enactments in the other language by the 
original speakers, for 3816 pairs of matching utterances. Data was collected in 
2022-2023. Participants were recruited from among students at the University of 
Texas at El Paso; all were bilingual speakers of General American English and 
of Mexico-Texas Border Spanish.

Each speaker pair had a 10 minute conversation in one language. Various 
fragments from these conversations were chosen for re-enactment, and the 
original speakers produced equivalents in the other language. Each re-enactment 
was vetted for fidelity to the original and naturalness in the target language. 
Also included is metadata about conversations, participants, re-enactments and 
utterances.

2024 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and uncheck the 
box next to “Receive Newsletter” under Account Options or contact LDC for 
assistance.


Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to