In this newsletter:
Renew your LDC membership today

New publications:
Iraqi Arabic - English Lexical 
Database<https://catalog.ldc.upenn.edu/LDC2025L01>
LORELEI Hungarian Representative Language 
Pack<https://catalog.ldc.upenn.edu/LDC2025T01>
________________________________
Renew your LDC membership today
The importance of curated resources for language-related education, research, 
and technology development drives LDC's mission to create them, to accept data 
contributions from researchers across the globe, and to broadly share such 
resources through the LDC Catalog. LDC members enjoy no-cost access to new 
corpora released annually, as well as the ability to license legacy data sets 
from among our 960+ holdings at reduced fees. Ensure that your data needs 
continue to be met by renewing your LDC membership or by joining the Consortium 
today.

Now through March 3, 2025, 2024 members receive a 10% discount on 2025 
membership, and new or returning organizations receive a 5% discount. 
Membership remains the most economical way to access current and past LDC 
releases. Consult Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for more 
details on membership options and benefits.
________________________________
New publications:

Iraqi Arabic - English Lexical 
Database<https://catalog.ldc.upenn.edu/LDC2025L01> was developed by LDC. It has 
six interrelated tables presenting over 67,000 Iraqi Arabic words as 
orthographic forms in Arabic script and pronunciation forms in IPA format, 
along with more than 120,000 English tokens.

This release is the result of a collaboration with Georgetown University Press 
<https://press.georgetown.edu/> to enhance and update three dialectal Arabic 
dictionaries -- Iraqi, Moroccan, and Syrian -- originally published in the 
1960s. The Georgetown Dictionary of Iraqi 
Arabic<https://press.georgetown.edu/Book/The-Georgetown-Dictionary-of-Iraqi-Arabic>
 was published in 2013. That work was based on, and expanded, two dictionaries, 
A Dictionary of Iraqi Arabic: English-Arabic (Clarity, Stowasser, and Wolfe, 
eds., 2003) and A Dictionary of Iraqi Arabic: Arabic-English (Woodhead and 
Beene, eds., 2003).

The several enhancements developed by LDC in the updated and enhanced 
dictionary and the lexical database included facilitating comparisons across 
Arabic dialects and Modern Standard Arabic by providing Arabic script spellings 
and IPA pronunciations to Iraqi words and phrases; promoting ease of use by 
language learners and researchers by developing reasonable orthographic 
conventions for applying the Arabic alphabet to the dialect; and facilitating a 
user's understanding of morphological and lexical relations by adding 
information on the linguistic structures of Iraqi Arabic.

The documentation accompanying this release includes instructions for combining 
into one database the tables in this corpus with the tables in Moroccan Arabic 
- English Lexical Database LDC2023L01.<https://catalog.ldc.upenn.edu/LDC2023L01>

2025 members can access this corpus through their LDC accounts provided they 
have submitted a completed copy of the special license agreement. Non-members 
may license this data for a fee.

*

LORELEI Hungarian Representative Language 
Pack<https://catalog.ldc.upenn.edu/LDC2025T01> is comprised of over 686 million 
words of Hungarian monolingual text, 165,000 words of which were translated 
into English, 2.3 million words of found Hungarian-English parallel text, and 
87,000 Hungarian words translated from English data. Approximately 72,500 words 
were annotated for named entities and over 25,000 words were annotated for full 
entity (including nominals and pronouns), entity linking and situation frames 
(identifying entities, needs and issues); over 17,000 words have simple 
semantic annotation; and close to 10,000 words were annotated for noun phrase 
chunking. Data was collected from discussion forum, news, reference, social 
network, and weblogs.

The LORELEI (Low Resource Languages for Emergent Incidents) program was 
concerned with building human language technology for low resource languages in 
the context of emergent situations. Representative languages were selected to 
provide broad typological coverage.

The knowledge base for entity linking annotation is available separately as 
LORELEI Entity Detection and Linking Knowledge Base 
(LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.

2025 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: [email protected]<mailto:[email protected]>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104





_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to