In this newsletter:
LDC at ICASSP 2023

New publications:
2019 NIST Speaker Recognition Evaluation Test Set - CTS 
Challenge<https://catalog.ldc.upenn.edu/LDC2023S03>
LORELEI Zulu Representative Language 
Pack<https://catalog.ldc.upenn.edu/LDC2023T06>
________________________________
LDC at ICASSP 2023
LDC will be exhibiting at ICASSP 2023<https://2023.ieeeicassp.org/>, held this 
year June 4-10 in Rhodes, Greece. Stop by booth 15 to learn more about recent 
developments at the Consortium and the latest publications.

LDC will post conference updates via Twitter<https://twitter.com/LDCupenn> and 
Facebook<https://www.facebook.com/ldc.upenn>. We look forward to seeing you 
there!
________________________________
New publications:
2019 NIST Speaker Recognition Evaluation Test Set - CTS 
Challenge<https://catalog.ldc.upenn.edu/LDC2023S03>, developed by LDC and NIST, 
contains 635 hours of Tunisian Arabic telephone recordings for development and 
test, answer keys, enrollment, trial files, and documentation from the CTS 
Challenge portion of the NIST-sponsored 2019 Speaker Recognition 
Evaluation<https://www.nist.gov/itl/iad/mig/nist-2019-speaker-recognition-evaluation>.
 The 2019 evaluation was conducted in two parts: (1) a leaderboard-style 
challenge based on conversational telephone speech from LDC's Call My Net 2 
(CMN2) corpus; and (2) a separate evaluation using audio-visual material 
collected by LDC for the VAST (Video Annotation for Speech Technology) project 
(released as LDC2023V01<https://catalog.ldc.upenn.edu/LDC2023V01>).

The telephone speech data for the CTS Challenge was drawn from the CMN2 
collection conducted by LDC in Tunisia in which Tunisian Arabic speakers called 
friends or relatives who agreed to record their telephone conversations lasting 
between 8-10 minutes. The speech segments include PSTN (public switched 
telephone network) and VOIP (voice over IP) data.

2023 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.
*
LORELEI Zulu Representative Language 
Pack<https://catalog.ldc.upenn.edu/LDC2023T06> is comprised of over 5 million 
words of Zulu monolingual text, 2.7 million words of found Zulu-English 
parallel text, and 71,000 Zulu words translated from English data. 
Approximately 100,000 words were annotated for named entities and over 23,000 
words were annotated for entity discovery and linking and situation frames 
(identifying entities, needs, and issues). Data was collected from discussion 
forum, news, reference, social network, and weblogs.

The LORELEI (Low Resource Languages for Emergent Incidents) program was 
concerned with building human language technology for low resource languages in 
the context of emergent situations. Representative languages were selected to 
provide broad typological coverage.

The knowledge base for entity linking annotation is available separately as 
LORELEI Entity Detection and Linking Knowledge Base 
(LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.

2023 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104



_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to