Fwd: FW: January 2018 Newsletter - LDC

lewis john mcgibbney Tue, 16 Jan 2018 09:33:08 -0800

FYI folks.

---------- Forwarded message ----------
From: Mcgibbney, Lewis J (398M) <lewis.j.mcgibb...@jpl.nasa.gov>
Date: Tue, Jan 16, 2018 at 9:09 AM
Subject: FW: January 2018 Newsletter - LDC
To: lewis john mcgibbney <lewi...@apache.org>







Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive
<https://maps.google.com/?q=4800+Oak+Grove+Drive%0D+%0D+%0D+Pasadena,+California+91109&entry=gmail&source=g>

Pasadena, California 91109
<https://maps.google.com/?q=4800+Oak+Grove+Drive%0D+%0D+%0D+Pasadena,+California+91109&entry=gmail&source=g>
-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402 <(818)%20393-7402>

Cell: (+1) (626)-487-3476 <(626)%20487-3476>

Fax:  (+1) (818)-393-1190 <(818)%20393-1190>

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/0000-0003-2185-928X



           [image: ignature_147184835]



 Dare Mighty Things



*From: *Ldc-customers1 <ldc-customers1-boun...@ldc.upenn.edu> on behalf of
Penn LDC <l...@ldc.upenn.edu>
*Date: *Tuesday, January 16, 2018 at 8:15 AM
*To: *Penn LDC <l...@ldc.upenn.edu>
*Subject: *January 2018 Newsletter - LDC



*In this newsletter: *

*Membership Discounts for MY2018 Still Available*

*New Publications:*

DEFT Spanish Treebank <https://catalog.ldc.upenn.edu/LDC2018T01>

DIRHA English WSJ Audio <https://catalog.ldc.upenn.edu/LDC2018S01>

TRAD Chinese-French Parallel Text – Blog
<https://catalog.ldc.upenn.edu/LDC2018T02>



____________________________________________________________
___________________



*Membership Discounts for MY2018 Still Available*

Join LDC while membership savings are still available. Now through March 1,
2018, renewing MY2017 members will receive a 10% discount off the
membership fee. New or non-consecutive member organizations will receive a
5% discount. Membership remains the most economical way to access LDC
releases. This year’s planned publications include Multilanguage
Conversational Telephone Speech, IARPA Babel Language Packs (telephone
speech and transcripts), DIRHA (Distant-speech Interaction for Robust Home
Applications), TRAD (Chinese-French and Arabic-French parallel text), data
from BOLT, DEFT, LORELEI, RATS and TAC KBP, and more. Browse the Members
<https://www.ldc.upenn.edu/members/join-ldc> pages for details on
membership options and benefits.

____________________________________________________________
___________________


* New publications:*



(1) DEFT Spanish Treebank <https://catalog.ldc.upenn.edu/LDC2018T01> was
developed by LDC and the Language and Computation Center (CLiC), University
of Barcelona <http://clic.ub.edu/>. It contains treebank annotation of
international Spanish newswire text and Latin American Spanish discussion
forum data created for the DARPA Deep Exploration and Filtering of Text
(DEFT) program. DEFT Spanish Treebank supported the program's goal of deep
natural language understanding.



Newswire source files were selected from Spanish Gigaword Third Edition (
LDC2011T12 <https://catalog.ldc.upenn.edu/ldc2011t12>) and were manually
sentence-segmented for DEFT. Discussion forum source files were selected
from Spanish discussion forum source data collected by LDC, consisting of
continuous multi-posts of 100-1000 words.



This release contains 114 files (54,394 tokens) of newswire data and 60
files (55,307 tokens) of discussion forum data all of which were annotated
with constituents and syntactic functions.

DEFT Spanish Treebank is distributed via web download.



2018 Subscription Members will receive copies of this corpus. 2018 Standard
Members may request a copy as part of their 16 free membership corpora.
Non-members may license this data for US $1000.



*



(2) DIRHA English WSJ Audio <https://catalog.ldc.upenn.edu/LDC2018S01> was
developed as part of the Distant-Speech Interaction for Robust Home
Applications (DIRHA) Project <https://dirha.fbk.eu/> which addressed
natural spontaneous speech interaction with distant microphones in a
domestic environment. It is comprised of approximately 85 hours of real and
simulated read speech by six native American English speakers. The target
utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A
<https://catalog.ldc.upenn.edu/LDC93S6A/>), specifically, the 5,000 word
subset of read speech from Wall Street Journal news text.



Speech was collected in a real apartment setting with typical domestic
background noise and inter/intra-room reverberation effects. Annotations,
speaker metadata and images of the apartment setting are also included.



DIRHA English WSJ Audio is distributed via web download.



2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for US $250.



*



(3) TRAD Chinese-French Parallel Text -- Blog
<https://catalog.ldc.upenn.edu/LDC2018T02> was developed by ELDA as part of
the PEA-TRAD project. It contains French translations of a subset of
approximately 10,000 Chinese words from GALE Phase 1 Chinese Blog Parallel
Text (LDC2008T06 <https://catalog.ldc.upenn.edu/LDC2008T06>).



The PEA-TRAD project (Translation as a Support for Document Analysis) was
supported by the French Ministry of Defense (DGA). Its purpose was to
develop speech-to-speech translation technology for multiple languages
(e.g., Arabic, Chinese, Pashto) from a variety of domains.



The source data for TRAD Chinese-French Parallel Text is Chinese blog text
collected and translated into English by LDC for the DARPA GALE (Global
Autonomous Language Exploitation) program. Information about the ELDA
translation team, translation guidelines and validation results is
contained in the documentation accompanying this release.



TRAD Chinese-French Parallel Text -- Blog is distributed via web download.



2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for US $250.





Membership Office

Linguistic Data Consortium <http://ldc.upenn.edu>

University of Pennsylvania

T: +1-215-573-1275 <(215)%20573-1275>

E: l...@ldc.upenn.edu

M: 3600 Market St. Suite 810
<https://maps.google.com/?q=3600+Market+St.+Suite+810%0D+%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0+Philadelphia,+PA+19104&entry=gmail&source=g>

      Philadelphia, PA 19104
<https://maps.google.com/?q=3600+Market+St.+Suite+810%0D+%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0+Philadelphia,+PA+19104&entry=gmail&source=g>







-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Fwd: FW: January 2018 Newsletter - LDC

Reply via email to