[Corpora-List] February 2025 Newsletter - LDC

Penn LDC via Corpora Mon, 17 Feb 2025 10:07:15 -0800

In this newsletter:
LDC at LT4ALL 2025
LDC membership discounts expire March 3
Spring 2025 data scholarship recipients


New publications:
AIDA Scenario 3 Practice Topic Source Data and 
Annotation<https://catalog.ldc.upenn.edu/LDC2025T02>
MATERIAL Georgian-English Language 
Pack<https://catalog.ldc.upenn.edu/LDC2025S01>
________________________________
LDC at LT4All 2025
LDC is pleased to be a sponsor of The 2nd International Conference on Language 
Technologies for All (LT4All 2025)<https://www.lt4all2025.eu/overview/>, 
February 24-26, 2025, organized by ELRA and SIGUL, the ELRA/ISCA Special 
Interest Group on Under-resourced Languages, and in partnership with UNESCO as 
part of the International Decade of Indigenous Languages (2022-2032). The 
conference theme, "Advancing Humanism through Language Technologies," focuses 
on community empowerment within the larger discussion on the many ways 
technology impacts language communities. The conference will also commemorate 
the Silver Jubilee of International Mother Language Day (February 21).

LDC membership discounts expire March 3
Time is running out to save on 2025 membership fees. Renew your LDC membership, 
rejoin the Consortium, or become a new member by March 3 to receive a discount 
of up to 10%. For more information on membership benefits and options, visit 
Join LDC<https://www.ldc.upenn.edu/members/join-ldc>.

Spring 2025 data scholarship recipients
Congratulations to the recipients of LDC's Spring 2025 data scholarships:

Sair Buckle: Charles Sturt University (Australia): PhD student, AI and Cyber 
Futures Institute. Sair is awarded a copy of Avocado Research Email Corpus 
LDC2015T03 for her work in behavioral science.

Le Phuoc Thinh Tien, Vietnam National University Ho Chi Minh City (Vietnam); 
Bachelor's student, Faculty of Information Technology. Le is awarded a copy of 
Penn Discourse Treebank Version 3.0 LDC2019T05 for his research in natural 
logical reasoning.

The next round of applications will be accepted in September 2025. For 
information about the program, visit the Data Scholarships 
page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
________________________________

New publications:

AIDA Scenario 3 Practice Topic Source Data and 
Annotation<https://catalog.ldc.upenn.edu/LDC2025T02> was developed by LDC and 
is comprised of English, Russian, and Spanish web documents (text, video, 
image) and annotations. Each phase of the AIDA program centered on a specific 
scenario, or broad topic area, with related subtopics designated as either 
practice subtopics or evaluation subtopics. The Phase 3 scenario focused on the 
COVID-19 global pandemic. This corpus contains source documents and annotations 
for the Scenario 3 practice topics.

The corpus contains 1417 root documents; 279 documents were annotated. 
Annotations include:

  *   Event, relation, and entity annotation (64 documents)
  *   Claim frame annotation: claims (true or not) relating to the COVID-19 
pandemic (203 documents)
  *   Practice topic query claim frames: example claim frames intended to be 
used by systems as queries to extract similar claims from additional documents 
(30 documents)
The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed 
to develop a multi-hypothesis semantic engine to generate explicit alternative 
interpretations of events, situations, and trends from a variety of 
unstructured sources. LDC supported AIDA by collecting, creating, and 
annotating multimodal linguistic resources in multiple languages.

2025 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

*

MATERIAL Georgian-English Language 
Pack<https://catalog.ldc.upenn.edu/LDC2025S01> was developed by 
Appen<http://www.appen.com/> for the IARPA 
MATERIAL<https://www.iarpa.gov/index.php/research-programs/material> program 
and contains 79 hours of Georgian conversational telephone speech, transcripts, 
English translations, annotations, and queries. Calls were made using different 
telephones (e.g., mobile, landline) from a variety of environments. Transcripts 
cover approximately half of the speech files, and approximately 3% of the 
speech data was translated into English. This release also includes English 
queries and their relevance annotations.

The MATERIAL program focused on underserved languages with the ultimate goal to 
build cross language information retrieval systems to find speech and text 
content using English search queries.

2025 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] February 2025 Newsletter - LDC

Reply via email to