[Corpora-List] CORRECTION: February 2024 Newsletter - LDC

Penn LDC via Corpora Thu, 15 Feb 2024 12:20:38 -0800

In this newsletter:
LDC membership discounts expire March 1
Spring 2024 data scholarship recipients
Four corpora withdrawn from the LDC Catalog


New publications:
Second Language University Speech Intelligibility 
Corpus<https://catalog.ldc.upenn.edu/LDC2024S02>
AIDA Scenario 1 Practice Topic 
Annotation<https://catalog.ldc.upenn.edu/LDC2024T02>
________________________________
LDC membership discounts expire March 1
Time is running out to save on 2024 membership fees. Renew your LDC membership, 
rejoin the Consortium, or become a new member by March 1 to receive a discount 
of up to 10%. For more information on membership benefits and options, visit 
Join LDC<https://www.ldc.upenn.edu/members/join-ldc>.

Spring 2024 data scholarship recipients
Congratulations to the recipients of LDC's Spring 2024 data scholarships:

Jordan Chandler: Université Rennes 2 (France): Master's student, English 
Studies. Jordan is awarded a copy of Penn Parsed Corpora of Historical English 
LDC2020T16 to continue his research on the historical development of adjective, 
quantifier, and article indefiniteness in the English language.

Nikhil Raghav: TCG Crest (India): PhD candidate, Institute for Advancing 
Intelligence. Nikhil is awarded copies of Third DIHARD Challenge Development 
LDC2022S12 and Third DIHARD Challenge Evaluation LDC2022S14 for his work in 
speaker diarization.

Abraham Sanders: Rensselaer Polytechnical Institute (USA): PhD candidate, 
Cognitive Science. Abraham is awarded copies of Fisher English Training Speech 
Part 1 Speech LDC2004S13, Fisher English Training Speech Part 1 Transcripts 
LDC2004T19, Fisher English Training Part 2 Speech LDC2005S13 and Fisher English 
Training Part 2 Transcripts LDC2005T19, for his work in spoken dialogue systems.

The next round of applications will be accepted in September 2024. For 
information about the program, visit the Data Scholarships 
page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.

Four corpora withdrawn from the LDC Catalog
We regret to announce that The New York Times Annotated Corpus, LDC2008T19, has 
been withdrawn from the LDC Catalog by the data provider. Because they contain 
data from LDC2008T19, the following three corpora are also withdrawn from the 
Catalog: Benchmarks for Open Relation Extraction LDC2014T27, Concretely 
Annotated New York Times LDC2018T12, and News Sub-domain Named Entity 
Recognition LDC2023T12. Organizations and individuals who have previously 
licensed any of these data sets can continue to use them under the terms of 
their respective special license agreements.
________________________________
New publications:
Second Language University Speech Intelligibility 
Corpus<https://catalog.ldc.upenn.edu/LDC2024S02> was developed by Northern 
Arizona University, The Pennsylvania State University, and The University of 
Texas at Dallas. It contains 10.5 hours of English speech collected from 66 
international faculty and university students representing 15 language 
backgrounds at 10 North American universities. This release also includes 
orthographic transcriptions for all recordings, intelligibility scores for 73% 
of the files, speaker metadata, and aligned Praat textgrids.

The speech data is comprised of presentations, descriptions, reflections, and 
microteaching tasks. Speakers were recruited from courses at intensive English 
programs and oral skills courses for international graduate students seeking to 
become international teaching assistants.

2024 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.
*
AIDA Scenario 1 Practice Topic 
Annotation<https://catalog.ldc.upenn.edu/LDC2024T02> was developed by LDC and 
is comprised of annotations for 212 English, Russian, and Ukrainian web 
documents (text, image, and video) from AIDA Scenario 1 Practice Topic Source 
Data (LDC2023T11)<https://catalog.ldc.upenn.edu/LDC2023T11>, specifically, the 
set of practice documents designated for annotation in Phase 1.

Annotations are presented as tab separated files in the following categories 
for each topic:

  *   Mentions: single references in source data to a real-world entity or 
filler, event, or relation.
  *   Slots: pre-defined roles in an event or relation filled by an argument 
(entity mention).
  *   Linking: entity mentions linked to entries in the knowledge base as a 
method of indicating the real-world entity to which an entity referred.
2024 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] CORRECTION: February 2024 Newsletter - LDC

Reply via email to