Dear ELRA Member,
Here is the latest news about the most noteworthy activities conducted
at ELRA and ELDA in February 2017. We would like to remind you that we
welcome your suggestions and comments on the topics presented below, and
on any other topic you would like to include in the next bulletins.
*1. ABOUT MEMBERSHIP**
**1.1. Membership *
For the period from 1st to 28th February 2017, the total number of paid
up members is 25.
*1.2. Membership Drive *
As a follow-up of the September 2016 meeting, a brainstorming meeting on
the ELRA membership drive and related ELRA services took place on 31st
January 2017 in Paris with Nicoletta Calzolari, Nick Campbell, Khalid
Choukri, Henk van den Heuvel and Joseph Mariani. A report will be
drafted and shared with the ELRA Board and members by Spring 2017.
*1.3. LREC 2018 *
Mi-February, the First Call for Papers was published on
http://www.lrec-conf.org/lrec2018/lrec2018-cfp.htm and circulated on the
mailing lists and on Twitter (@LREC2018, #LREC2018). It was also sent to
all the LREC 2016 participants. The 11th edition of LREC will be held on
May 7-12, 2018 in Miyazaki, Japan. A temporary web page has been set up
at http://www.lrec-conf.org/lrec2018/lrec2018.htm and will be updated
until the publication of the permanent web site.
*2. RESOURCES*
We are happy to announce that 1 new Evaluation Package is now available
in our catalogue.
*ELRA-E0046 ETAPE Evaluation Package**
**ISLRN: 425-777-374-455-4 *
The ETAPE Evaluation Package consists of ca. 30 hours of radio and TV
data, selected to include mostly non planned speech and a reasonable
proportion of multiple speaker data. All data were carefully
transcribed, including named entity annotation.
This package includes the material that was used for the ETAPE
evaluation campaign. It includes resources, scoring tools, results of
the campaign, etc., that were used or produced during the campaign. The
aim of this evaluation package is to enable external players to evaluate
their own system and compare their results with those obtained during
the campaign itself.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=12
For more information on the catalogue, please contact Valérie Mapelli
mailto:[email protected]
If you would like to enquire about having your resources distributed by
ELRA, please do not hesitate to contact us.
Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates:
http://www.elra.info/LRs-Announcements.html
*2.1. ISLRN*
This month, the following resources have been allocated ISLRN.
*Title*
*ISLRN*
SALA II US English database (2000 speakers)
829-229-153-801-9
ETAPE Evaluation Package
425-777-374-455-4
A multilingual, multi-style and multi-granularity dataset for
cross-language textual similarity detection
723-785-513-738-2
First-Year Law Students' Court Memoranda
141-827-463-794-4
GALE Phase 3 Arabic Broadcast News Speech Part 2
459-849-510-597-1
GALE Phase 3 Arabic Broadcast News Transcripts Part 2
539-362-793-352-9
IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b
763-119-338-310-1
*3. PROJECTS AND INITIATIVES**
**3.1. Production Projects*
_*Sentiment annotation in French tweets*_
ELDA has started a big annotation project consisting in deep sentiment
and opinion tagging of tweets in the French language. Several annotators
have been hired and work has already been undertaken. On this occasion,
several natural language processing and data validation tools developed
at ELDA for previous projects are being re-used to leverage the
productivity of the annotation team and to improve the quality of the
annotations.
In february, ELDA pursued its activities in the French tweet opinion
annotation project and made several deliveries, to the full satisfaction
of the customer.
*3.2. Projects*
_*CRACKER (Cracking the Language Barrier: Coordination, Evaluation and
Resources for European MT Research)*_
CRACKER is a Coordination and Support Action under the H2020 Programme
from the European Commission. This action has just started and has held
its kick-off meeting in Berlin last February 10th, meeting which has
been organised by its coordinator Deutsches Forschungszentrum für
Künstliche Intelligenz GmbH (DFKI). The other members of the Consortium
are: Charles University of Prague (CUNI), Czech Republic; Evaluations
and Language Resources Distribution Agency SA (ELDA), France; Fondazione
Bruno Kessler (FBK), Italy; Athena Research and Innovation Center in
Information, Communication and Knowledge Technologies (ATHENA RC),
Greece; University of Edinburgh (UEDIN), UK, and University of Sheffield
(USFD), UK.
CRACKER aims at providing planned coordination and support to the
European machine translation research community, which is suffering from
the pressure of the current challenges and needs of the Digital Single
Market.
ELDA decided to undertake META-SHARE upgrades again, by working in close
cooperation with the ILSP. The first step is to merge ELDA and ILSP's
contributions and to publish them on the META-SHARE GitHub repository.
_*CEF Language Resource Coordination*_
The SMART 2014/1074 Language Resource Coordination, funded by the CEF
(Connecting Europe Facility) programme, was launched during the Riga
Summit, held late April 2015 in Latvia. The objectives of this 2-year
project are to:
• improve availability and simplify access to language resources
(LRs) relevant for MT,
• establish an observatory for language resources across EU Member
States and CEF associated countries,
• raise awareness among stakeholders about the value and use of data
for automated translation
• clarify legal and commercial issues related to the data.
Targeted data are those produced by the public sector in the EU, which
can be made available for re-use through the EU Open Data portal, with
suitable copyright protection.
The project is coordinated by Deutsches Forschungszentrum für Künstliche
Intelligenz GmbH (DFKI) and the other members of the European Language
Resources Coordination Consortium (ELRC) are ELRA, TILDE, ILSP and TAUS.
Eight tasks have been specified for this programme and ELDA will lead
three of them including the setup the technical Helpdesk (T2), the
organization of 30+ training workshops (T6) and the Language Resources
collection (T7).
In February, ELDA continued its main activities regarding the production
and validation of data, the upgrading of the data processing and
packaging tools, as well as discussions with potential donators.
Specific effort was dedicated to 1) updating the validation guidelines
and drafting a validation report template to be exploited in the coming
validation phase, 2) running a deeper analysis of legal issues with
respect to donated data as well as the supporting of partners in
drafting specific user agreements with donators, and 3) maintaining and
upgrading the crawled data management toolkit, mainly to enhance the
manual validation integration, and to allow the toolkit to be used for
donated data handling and validation.
The ELRC website provides information on the project and access to
services such as the Helpdesk can be found at http://elrc.tilde.com/home.
*_European Language Resource Coordination +_*
Following the work of the European Language Resource Coordination (ELRC)
action (http://lr-coordination.eu/) within CEF.AT, the European
Commission has launched two further actions under the same principles
and also within the Connecting Europe Facility (CEF) Programme:
• SMART 2015/1091 Tools and Resources for CEF Automated Translation
Lot 2 (ELRC+2)
• SMART 2015/1091 Tools and Resources for CEF Automated Translation
Lot 3 (ELRC+3)
Both of them are 3-year actions and their inception meetings with the
European Commission took place on January 17th, 2017, in Luxembourg.
_*European Language Resource Coordination +2 (ELRC+2) *_
The inception meeting of the European Language Resource Coordination +2
(ELRC+2) took place at the EC premises in Luxembourg, with the
participation of the ELRC+2 Consortium, namely, ELDA (France), DFKI
(Germany), ILSP (Greece) and TILDE (Latvia), as well as representatives
of DG Connect and DG Translation from the EC.
The goals of this 3-year project are to:
• set up and operate a repository to host Language Resources to
support MT systems within CEF Automated Translation platform;
• set up and operate an intellectual property rights (IPR) support
and clearance desk for Language Resources;
• complement and continue Language Resource coordination activities
undertaken by ELRC service contract (SMART 2014/1074), such as improving
the availability of LRs held by the public sector, establishing an
observatory for LRs across EU Member States and CEF associated countries
and raising awareness among public data holders of the value of LRs for MT.
The project, which will be coordinated by DFKI, comprises ten tasks.
ELDA will be leading three of them:
• the technical helpdesk (T3),
• the legal helpdesk (T4)
• the IPR Clearance of 200 LRs (T5).
In February the consortium finalized the inception report, which further
specifies the methodology, agreed progress indicators, resources and
objectives in accordance with the feedback provided by the EC during the
inception meeting. Within T8 (country-specific workshops) the consortium
produced a draft for a workshop concept and master agenda to be approved
by the EC. ELRC+2 workshops constitute the second round of ELRC
reach-out activities. The main novelty of this new series of workshops
lies in the reinforcement of the policy-level component targeting
decision-makers, as well as in the introduction of a hands-on session
for data holders and potential contributors.
Fortnightly web conferences with the EC have continued to take place in
order to discuss topics such as the involvement of DGT in ELRC
activities and the organisation of the ELRC conference (T7) to be held
before the end of 2017.
_*
European Language Resource Coordination +3 (ELRC+3) *_
ELRC+3 counts on the ELRC Consortium Members as partners of the present
action: Tilde (coordinator - Latvia), ELDA (France), DFKI (Germany) and
ILSP (Greece). The main objective of ELRC+3 is to continue the ELRC's
ongoing work in helping the EC obtain resources for the training and
optimization of the CEF Automated Translation platform, for the CEF
languages, and in domains of interest to the CEF Digital Service
Infrastructures (DSIs). For that purpose, this action aims to identify,
collect, clear, produce, process and make available further resources to
the EC.
In this context, ELDA will be leadering the following activities:
• Adaptation of the existing ELRC database of sources, revising and
customising it for the new needs and requirements.
• Identification of licensing conditions and right holder(s) for the
new resources.
• Dissemination activities, also in support of the ELRC+2 action.
• Anonymisation of language resource databases: this will depend on
the requirements of the language resource stakeholders and regulations
on personal data protection.
• Validation of language resources and their metadata, which implies
the quality evaluation of each deliverable language resource (both
monolingual and parallel).
• Clearing of IPRs and other legal issues that may arise for the data
collected.
In February, the Inception report has been reviewed by the European
Commission (EC) during February and its final version is under
preparation for March, following EC's recommendations. In the meantime,
work has started for the different tasks, in particular concerning the
identification and processing of an initial batch of language resources.
With regard to dissemination, the ELRC website is going to be enhanced
so as to welcome the needs of the new ELRC+2 and ELRC+3 projects.
Discussion has also started on the new on-site assistance instrument
that is defined to take place within the project. This assistance is
intended to go beyond that currently offered within ELRC, supporting
data owners with their technical questions related to data processing
and provision.
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list