Thanks Lewis. I’m also an org rep for NASA at LDC, and also via my USC hat. Good show.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 5/20/16, 8:45 AM, "Lewis John Mcgibbney" <[email protected]> wrote: >Hi Folks, >I've ended up primary JPL organizational rep for the linguistics data >consortium. They produce monthly newsletters (see below for most >recent) which I will be forwarding to dev@ Joshua from now on. >They are pretty cool, especially the new datasets they publish. >Lewis > >---------- Forwarded message ---------- >From: *Mcgibbney, Lewis J (398M)* <[email protected]> >Date: Friday, May 20, 2016 >Subject: Fwd: May 2016 Newsletter – LDC >To: "[email protected]" <[email protected]> > > > > >Sent from my iPhone > >Begin forwarded message: > >*From:* Linguistic Data Consortium <[email protected] ><javascript:_e(%7B%7D,'cvml','[email protected]');>> >*Date:* May 16, 2016 at 8:20:33 AM PDT >*To:* Linguistic Data Consortium <[email protected] ><javascript:_e(%7B%7D,'cvml','[email protected]');>> >*Subject:* *May 2016 Newsletter – LDC* > >*In this newsletter:* > >*LDC at LREC 2016* > > > >*New publications:* > >SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing ><#m_-2915229479963685663_SDP> > > >GALE Phase 4 Chinese Broadcast Conversation Speech ><#m_-2915229479963685663_GALE1> > > >GALE Phase 4 Chinese Broadcast Conversation Transcripts ><#m_-2915229479963685663_GALE2> > > > > > >*LDC at LREC 2016* > > > >LDC will attend the 10th Language Resource Evaluation Conference >(LREC2016), hosted by ELRA, the European Language Resource Association. The >conference will be held in Portorož, Slovenia from May 23-28 and features a >broad range of sessions on language resources and human language >technologies research. Seven LDC staff members will be presenting current >work on topics including trends in HLT research, building language >resources for autism spectrum disorders, data management plans, rapid >development of morphological analyzers for typologically diverse languages, >selection criteria for low resource language programs, multi-language >speech collection for NIST LRE, novel incentives for collecting data and >annotation from people, and more. > > > >Following the conference, LDC’s presented papers and posters will be >available on LDC’s Papers Page ><https://www.ldc.upenn.edu/language-resources/papers/ldc-papers>. > > > > > >New Corpora > > > >(1) SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing ><https://catalog.ldc.upenn.edu/LDC2016S03> consists of data, tools, system >results, and publications associated with the 2014 and 2015 tasks on >Broad-Coverage Semantic Dependency Parsing (SDP <http://sdp.delph-in.net/>) >conducted in conjunction with the International Workshop on Semantic >Evaluation (SemEval <http://alt.qcri.org/semeval2015/>) and was developed >by the SDP task organizers. > >SemEval is an ongoing series of evaluations of computational semantic >analysis systems intended to explore the nature of meaning in language. It >evolved from the Senseval <http://www.senseval.org/> word sense >disambiguation series to include semantic analysis tasks outside of word >sense disambiguation. > >This release is based on English, Chinese and Czech data from the following >resources: Treebank-2 LDC95T17 <https://catalog.ldc.upenn.edu/LDC95T7>, >Proposition Bank I LDC2004T14 <https://catalog.ldc.upenn.edu/LDC2004T14>, >NomBaank v 1.0 LDC2008T23 <https://catalog.ldc.upenn.edu/LDC2008T23> and >CCGBank LDC2005T13 <https://catalog.ldc.upenn.edu/LDC2005T13>(English); >Chinese Treebank (e.g., Chinese Treebank 8.0 LDC2013T21 ><https://catalog.ldc.upenn.edu/LDC2013T21>) (Chinese); and Prague >Dependency Treebank (e.g., Prague Dependency Treebank 2.0, LDC2006T01 ><https://catalog.ldc.upenn.edu/LDC2006T01>) (Czech). > >The results are presented as graphs in three target representations: >MRS-Derived Semantic Dependencies (DM), Enju Predicate–Argument Structures >(PAS), and Prague Semantic Dependencies (PSD). As a fourth, additional >target representation CCGbank was converted to semantic dependency graphs >(in the subdirectory ‘ccd’). > >SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing is distributed >via web download. > >2016 Subscription Members will automatically receive two copies of this >corpus. 2016 Standard Members may request a copy as part of their 16 free >membership corpora. Non-members may license this data for US $400. > > > >* > >(2) GALE Phase 4 Chinese Broadcast Conversation Speech ><https://catalog.ldc.upenn.edu/LDC2016S03> was developed by LDC and is >comprised of approximately 172 hours of Mandarin Chinese broadcast >conversation speech collected in 2008 by LDC and Hong Kong University of >Science and Technology during Phase 4 of the DARPA GALE (Global Autonomous >Language Exploitation) Program. > >Corresponding transcripts are released as GALE Phase 4 Chinese Broadcast >Conversation Transcripts (LDC2016T12 ><http://catalog.ldc.upenn.edu/LDC2016T12>). > >The broadcast conversation recordings in this release feature interviews, >call-in programs and roundtable discussions focusing principally on current >events and are contained in 236 audio files presented in FLAC ><http://flac.sourceforge.net/>-compressed Waveform Audio File format >(.flac), 16000 Hz single-channel 16-bit PCM. Each file was audited by a >native Chinese speaker following Audit Procedure Specification Version 2.0 >which is included in this release. > >GALE Phase 4 Chinese Broadcast Conversation Speech is distributed via web >download. > > > >2016 Subscription Members will automatically receive two copies of this >corpus. 2016 Standard Members may request a copy as part of their 16 free >membership corpora. Non-members may license this data for US $2000. > > > >* > >(3) GALE Phase 4 Chinese Broadcast Conversation Transcripts ><https://catalog.ldc.upenn.edu/LDC2016T12> was developed by LDC and >contains transcriptions of approximately 172 hours of Chinese broadcast >conversation speech collected in 2008 by LDC and Hong Kong University of >Science and Technology during Phase 4 of the DARPA GALE (Global Autonomous >Language Exploitation) Program. > >Corresponding audio data is released as GALE Phase 4 Chinese Broadcast >Conversation Speech (LDC2016S03 <https://catalog.ldc.upenn.edu/LDC2016S03>). > >The transcript files are in plain-text, tab-delimited format (TDF) with >UTF-8 encoding, and the transcribed data totals 2,259,952 tokens. > >The files in this corpus were transcribed by LDC staff and/or by >transcription vendors under contract to LDC. Transcribers followed LDC’s >quick transcription guidelines (QTR) and quick rich transcription >specification (QRTR). QTR transcription consists of quick (near-) verbatim, >time-aligned transcripts plus speaker identification with minimal >additional mark-up. QRTR adds additional structural information such as >topic boundaries and manual sentence unit annotation. > >GALE Phase 4 Chinese Broadcast Conversation Transcripts is distributed via >web download. > >2016 Subscription Members will automatically receive two copies of this >corpus. 2016 Standard Members may request a copy as part of their 16 free >membership corpora. Non-members may license this data for US $1500. > > >-- >Membership Office >Linguistic Data Consortium >University of Pennsylvania >3600 Market St. Suite 810 >Philadelphia, PA 19130 >Tel: 215-573-1275email:[email protected] ><javascript:_e(%7B%7D,'cvml','email:[email protected]');> >Fax: 215-573-2175 > > > > >-- >*Lewis*
