Hi Folks,
I've ended up primary JPL organizational rep for the linguistics data
consortium. They produce monthly newsletters (see below for most
recent) which I will be forwarding to dev@ Joshua from now on.
They are pretty cool, especially the new datasets they publish.
Lewis

---------- Forwarded message ----------
From: *Mcgibbney, Lewis J (398M)* <[email protected]>
Date: Friday, May 20, 2016
Subject: Fwd: May 2016 Newsletter – LDC
To: "[email protected]" <[email protected]>




Sent from my iPhone

Begin forwarded message:

*From:* Linguistic Data Consortium <[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
*Date:* May 16, 2016 at 8:20:33 AM PDT
*To:* Linguistic Data Consortium <[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
*Subject:* *May 2016 Newsletter – LDC*

*In this newsletter:*

*LDC at LREC 2016*



*New publications:*

­­SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing
<#m_-2915229479963685663_SDP>


GALE Phase 4 Chinese Broadcast Conversation Speech
<#m_-2915229479963685663_GALE1>


GALE Phase 4 Chinese Broadcast Conversation Transcripts
<#m_-2915229479963685663_GALE2>





*LDC at LREC 2016*



LDC will attend the 10th Language Resource Evaluation Conference
(LREC2016), hosted by ELRA, the European Language Resource Association. The
conference will be held in Portorož, Slovenia from May 23-28 and features a
broad range of sessions on language resources and human language
technologies research. Seven LDC staff members will be presenting current
work on topics including trends in HLT research, building language
resources for autism spectrum disorders, data management plans, rapid
development of morphological analyzers for typologically diverse languages,
selection criteria for low resource language programs, multi-language
speech collection for NIST LRE, novel incentives for collecting data and
annotation from people, and more.



Following the conference, LDC’s presented papers and posters will be
available on LDC’s Papers Page
<https://www.ldc.upenn.edu/language-resources/papers/ldc-papers>.





New Corpora



(1) SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing
<https://catalog.ldc.upenn.edu/LDC2016S03> consists of data, tools, system
results, and publications associated with the 2014 and 2015 tasks on
Broad-Coverage Semantic Dependency Parsing (SDP <http://sdp.delph-in.net/>)
conducted in conjunction with the International Workshop on Semantic
Evaluation (SemEval <http://alt.qcri.org/semeval2015/>) and was developed
by the SDP task organizers.

SemEval is an ongoing series of evaluations of computational semantic
analysis systems intended to explore the nature of meaning in language. It
evolved from the Senseval <http://www.senseval.org/> word sense
disambiguation series to include semantic analysis tasks outside of word
sense disambiguation.

This release is based on English, Chinese and Czech data from the following
resources: Treebank-2 LDC95T17 <https://catalog.ldc.upenn.edu/LDC95T7>,
Proposition Bank I LDC2004T14 <https://catalog.ldc.upenn.edu/LDC2004T14>,
NomBaank v 1.0 LDC2008T23 <https://catalog.ldc.upenn.edu/LDC2008T23> and
CCGBank LDC2005T13  <https://catalog.ldc.upenn.edu/LDC2005T13>(English);
Chinese Treebank (e.g., Chinese Treebank 8.0 LDC2013T21
<https://catalog.ldc.upenn.edu/LDC2013T21>) (Chinese); and Prague
Dependency Treebank (e.g., Prague Dependency Treebank 2.0, LDC2006T01
<https://catalog.ldc.upenn.edu/LDC2006T01>) (Czech).

The results are presented as graphs in three target representations:
MRS-Derived Semantic Dependencies (DM), Enju Predicate–Argument Structures
(PAS), and Prague Semantic Dependencies (PSD). As a fourth, additional
target representation CCGbank was converted to semantic dependency graphs
(in the subdirectory ‘ccd’).

SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing is distributed
via web download.

2016 Subscription Members will automatically receive two copies of this
corpus. 2016 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for US $400.



*

(2) GALE Phase 4 Chinese Broadcast Conversation Speech
<https://catalog.ldc.upenn.edu/LDC2016S03> was developed by LDC and is
comprised of approximately 172 hours of Mandarin Chinese broadcast
conversation speech collected in 2008 by LDC and Hong Kong University of
Science and Technology during Phase 4 of the DARPA GALE (Global Autonomous
Language Exploitation) Program.

Corresponding transcripts are released as GALE Phase 4 Chinese Broadcast
Conversation Transcripts (LDC2016T12
<http://catalog.ldc.upenn.edu/LDC2016T12>).

The broadcast conversation recordings in this release feature interviews,
call-in programs and roundtable discussions focusing principally on current
events and are contained in 236 audio files presented in FLAC
<http://flac.sourceforge.net/>-compressed Waveform Audio File format
(.flac), 16000 Hz single-channel 16-bit PCM. Each file was audited by a
native Chinese speaker following Audit Procedure Specification Version 2.0
which is included in this release.

GALE Phase 4 Chinese Broadcast Conversation Speech is distributed via web
download.



2016 Subscription Members will automatically receive two copies of this
corpus. 2016 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for US $2000.



*

(3) GALE Phase 4 Chinese Broadcast Conversation Transcripts
<https://catalog.ldc.upenn.edu/LDC2016T12> was developed by LDC and
contains transcriptions of approximately 172 hours of Chinese broadcast
conversation speech collected in 2008 by LDC and Hong Kong University of
Science and Technology during Phase 4 of the DARPA GALE (Global Autonomous
Language Exploitation) Program.

Corresponding audio data is released as GALE Phase 4 Chinese Broadcast
Conversation Speech (LDC2016S03 <https://catalog.ldc.upenn.edu/LDC2016S03>).

The transcript files are in plain-text, tab-delimited format (TDF) with
UTF-8 encoding, and the transcribed data totals 2,259,952 tokens.

The files in this corpus were transcribed by LDC staff and/or by
transcription vendors under contract to LDC. Transcribers followed LDC’s
quick transcription guidelines (QTR) and quick rich transcription
specification (QRTR). QTR transcription consists of quick (near-) verbatim,
time-aligned transcripts plus speaker identification with minimal
additional mark-up. QRTR adds additional structural information such as
topic boundaries and manual sentence unit annotation.

GALE Phase 4 Chinese Broadcast Conversation Transcripts is distributed via
web download.

2016 Subscription Members will automatically receive two copies of this
corpus. 2016 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for US $1500.


-- 
Membership Office
Linguistic Data Consortium
University of Pennsylvania
3600 Market St. Suite 810
Philadelphia, PA 19130
Tel: 215-573-1275email:[email protected]
<javascript:_e(%7B%7D,'cvml','email:[email protected]');>
Fax: 215-573-2175




-- 
*Lewis*

Reply via email to