Dear Sender,

I am currently out of the office and will not be checking emails regularly. I 
will return on September 9, and will respond to your message as soon as 
possible after that date.

Best regards,
Charlott Jakob

On 15 Aug 2024, at 17:51, Penn LDC via Corpora <[email protected]> wrote:

In this newsletter:
Fall 2024 LDC Data Scholarship program

New publications:
LORELEI Uyghur Incident Language Pack
Ravnursson Faroese Speech and Transcripts



Fall 2024 LDC Data Scholarship program
Student applications for the Fall 2024 LDC Data Scholarship program are being 
accepted now through September 15, 2024. This program provides eligible 
students with no-cost access to LDC data. Students must complete an application 
consisting of a data use proposal and letter of support from their advisor. For 
application requirements and program rules, visit the LDC Data Scholarships 
page.


New publications:
LORELEI Uyghur Incident Language Pack was developed by LDC and is comprised of 
28 million words of Uyghur monolingual text, 500,000 words of English 
monolingual text, 3.3 million words of parallel and comparable Uyghur-English 
text, and 200,000 words annotated for simple named entities and situation 
frames. It constitutes all of the text data, annotations, supplemental 
resources, and related software tools for the Uyghur language that were used in 
the DARPA LORELEI / LoReHLT 2016 Evaluation.

The LORELEI (Low Resource Languages for Emergent Incidents) program was 
concerned with building human language technology for low resource languages in 
the context of emergent situations. In the evaluation scenario, an unforeseen 
event triggered a need for humanitarian and logistical support in a region 
where the incident language had received little or no attention in NLP 
research. Evaluation participants provided NLP solutions, including information 
extraction and machine translation, with limited resources and limited 
development time.

Data was collected from news, social network, weblog, newsgroup, discussion 
forum, and reference material. Named entity annotation identified entities to 
be detected by systems for scoring purposes. Situation frame analysis was 
designed to extract basic information about needs and relevant issues for 
planning a disaster response effort.

2024 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.


*


Ravnursson Faroese Speech and Transcripts contains 109 hours of Faroese 
prompted speech from 433 speakers (249 female, 184 male), corresponding 
transcripts and speaker metadata. It is an extract from the Basic Language 
Resource Kit 1.0 (BLARK 1.0) developed by the Faroe Islands' Ravnur Project.

Speech data was collected in 2022. Speakers from all major dialect areas in the 
Faroe Islands in three age groups -- 15-35, 36-60, and 61+ years -- read texts 
that included a word list, a phrase list, closed vocabulary readings, and short 
texts. Recordings also contain spontaneous speech. Orthographic transcripts are 
included.

2024 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data at no cost.

To unsubscribe from this newsletter, log in to your LDC account and uncheck the 
box next to “Receive Newsletter” under Account Options or contact LDC for 
assistance.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104





_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to