[Mt-list] ELRA Members' News - February 2017

ELRA ELDA Information Sat, 11 Mar 2017 03:47:13 -0800

Dear ELRA Member,

Here is the latest news about the most noteworthy activities conductedat ELRA and ELDA in February 2017. We would like to remind you that wewelcome your suggestions and comments on the topics presented below, andon any other topic you would like to include in the next bulletins.


*1.    ABOUT MEMBERSHIP**
**1.1.    Membership *

For the period from 1st to 28th February 2017, the total number of paidup members is 25.


*1.2.    Membership Drive *

As a follow-up of the September 2016 meeting, a brainstorming meeting onthe ELRA membership drive and related ELRA services took place on 31stJanuary 2017 in Paris with Nicoletta Calzolari, Nick Campbell, KhalidChoukri, Henk van den Heuvel and Joseph Mariani. A report will bedrafted and shared with the ELRA Board and members by Spring 2017.


*1.3.    LREC 2018 *

Mi-February, the First Call for Papers was published onhttp://www.lrec-conf.org/lrec2018/lrec2018-cfp.htm and circulated on themailing lists and on Twitter (@LREC2018, #LREC2018). It was also sent toall the LREC 2016 participants. The 11th edition of LREC will be held onMay 7-12, 2018 in Miyazaki, Japan. A temporary web page has been set upat http://www.lrec-conf.org/lrec2018/lrec2018.htm and will be updateduntil the publication of the permanent web site.


*2.    RESOURCES*

We are happy to announce that 1 new Evaluation Package is now availablein our catalogue.


*ELRA-E0046 ETAPE Evaluation Package**
**ISLRN: 425-777-374-455-4 *

The ETAPE Evaluation Package consists of ca. 30 hours of radio and TVdata, selected to include mostly non planned speech and a reasonableproportion of multiple speaker data. All data were carefullytranscribed, including named entity annotation.This package includes the material that was used for the ETAPEevaluation campaign. It includes resources, scoring tools, results ofthe campaign, etc., that were used or produced during the campaign. Theaim of this evaluation package is to enable external players to evaluatetheir own system and compare their results with those obtained duringthe campaign itself.For more information, see:http://catalog.elra.info/product_info.php?products_id=12

For more information on the catalogue, please contact Valérie Mapellimailto:[email protected]If you would like to enquire about having your resources distributed byELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info

Archives of ELRA Language Resources Catalogue Updates:http://www.elra.info/LRs-Announcements.html


*2.1.    ISLRN*
This month, the following resources have been allocated ISLRN.

*Title*

        

*ISLRN*

SALA II US English database (2000 speakers)

        

829-229-153-801-9

ETAPE Evaluation Package

        

425-777-374-455-4

A multilingual, multi-style and multi-granularity dataset forcross-language textual similarity detection


        

723-785-513-738-2

First-Year Law Students' Court Memoranda

        

141-827-463-794-4

GALE Phase 3 Arabic Broadcast News Speech Part 2

        

459-849-510-597-1

GALE Phase 3 Arabic Broadcast News Transcripts Part 2

        

539-362-793-352-9

IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b

        

763-119-338-310-1


*3.    PROJECTS AND INITIATIVES**
**3.1.    Production Projects*
_*Sentiment annotation in French tweets*_

ELDA has started a big annotation project consisting in deep sentimentand opinion tagging of tweets in the French language. Several annotatorshave been hired and work has already been undertaken. On this occasion,several natural language processing and data validation tools developedat ELDA for previous projects are being re-used to leverage theproductivity of the annotation team and to improve the quality of theannotations.In february, ELDA pursued its activities in the French tweet opinionannotation project and made several deliveries, to the full satisfactionof the customer.


*3.2.    Projects*

_*CRACKER (Cracking the Language Barrier: Coordination, Evaluation andResources for European MT Research)*_CRACKER is a Coordination and Support Action under the H2020 Programmefrom the European Commission. This action has just started and has heldits kick-off meeting in Berlin last February 10th, meeting which hasbeen organised by its coordinator Deutsches Forschungszentrum fürKünstliche Intelligenz GmbH (DFKI). The other members of the Consortiumare: Charles University of Prague (CUNI), Czech Republic; Evaluationsand Language Resources Distribution Agency SA (ELDA), France; FondazioneBruno Kessler (FBK), Italy; Athena Research and Innovation Center inInformation, Communication and Knowledge Technologies (ATHENA RC),Greece; University of Edinburgh (UEDIN), UK, and University of Sheffield(USFD), UK.CRACKER aims at providing planned coordination and support to theEuropean machine translation research community, which is suffering fromthe pressure of the current challenges and needs of the Digital SingleMarket.ELDA decided to undertake META-SHARE upgrades again, by working in closecooperation with the ILSP. The first step is to merge ELDA and ILSP'scontributions and to publish them on the META-SHARE GitHub repository.


_*CEF Language Resource Coordination*_

The SMART 2014/1074 Language Resource Coordination, funded by the CEF(Connecting Europe Facility) programme, was launched during the RigaSummit, held late April 2015 in Latvia. The objectives of this 2-yearproject are to:• improve availability and simplify access to language resources(LRs) relevant for MT,• establish an observatory for language resources across EU MemberStates and CEF associated countries,• raise awareness among stakeholders about the value and use of datafor automated translation

•    clarify legal and commercial issues related to the data.

Targeted data are those produced by the public sector in the EU, whichcan be made available for re-use through the EU Open Data portal, withsuitable copyright protection.The project is coordinated by Deutsches Forschungszentrum für KünstlicheIntelligenz GmbH (DFKI) and the other members of the European LanguageResources Coordination Consortium (ELRC) are ELRA, TILDE, ILSP and TAUS.Eight tasks have been specified for this programme and ELDA will leadthree of them including the setup the technical Helpdesk (T2), theorganization of 30+ training workshops (T6) and the Language Resourcescollection (T7).In February, ELDA continued its main activities regarding the productionand validation of data, the upgrading of the data processing andpackaging tools, as well as discussions with potential donators.Specific effort was dedicated to 1) updating the validation guidelinesand drafting a validation report template to be exploited in the comingvalidation phase, 2) running a deeper analysis of legal issues withrespect to donated data as well as the supporting of partners indrafting specific user agreements with donators, and 3) maintaining andupgrading the crawled data management toolkit, mainly to enhance themanual validation integration, and to allow the toolkit to be used fordonated data handling and validation.The ELRC website provides information on the project and access toservices such as the Helpdesk can be found at http://elrc.tilde.com/home.


*_European Language Resource Coordination +_*

Following the work of the European Language Resource Coordination (ELRC)action (http://lr-coordination.eu/) within CEF.AT, the EuropeanCommission has launched two further actions under the same principlesand also within the Connecting Europe Facility (CEF) Programme:• SMART 2015/1091 Tools and Resources for CEF Automated TranslationLot 2 (ELRC+2)• SMART 2015/1091 Tools and Resources for CEF Automated TranslationLot 3 (ELRC+3)Both of them are 3-year actions and their inception meetings with theEuropean Commission took place on January 17th, 2017, in Luxembourg.


_*European Language Resource Coordination +2 (ELRC+2) *_

The inception meeting of the European Language Resource Coordination +2(ELRC+2) took place at the EC premises in Luxembourg, with theparticipation of the ELRC+2 Consortium, namely, ELDA (France), DFKI(Germany), ILSP (Greece) and TILDE (Latvia), as well as representativesof DG Connect and DG Translation from the EC.

The goals of this 3-year project are to:

• set up and operate a repository to host Language Resources tosupport MT systems within CEF Automated Translation platform;• set up and operate an intellectual property rights (IPR) supportand clearance desk for Language Resources;• complement and continue Language Resource coordination activitiesundertaken by ELRC service contract (SMART 2014/1074), such as improvingthe availability of LRs held by the public sector, establishing anobservatory for LRs across EU Member States and CEF associated countriesand raising awareness among public data holders of the value of LRs for MT.The project, which will be coordinated by DFKI, comprises ten tasks.ELDA will be leading three of them:

•    the technical helpdesk (T3),
•    the legal helpdesk (T4)
•    the IPR Clearance of 200 LRs (T5).

In February the consortium finalized the inception report, which furtherspecifies the methodology, agreed progress indicators, resources andobjectives in accordance with the feedback provided by the EC during theinception meeting. Within T8 (country-specific workshops) the consortiumproduced a draft for a workshop concept and master agenda to be approvedby the EC. ELRC+2 workshops constitute the second round of ELRCreach-out activities. The main novelty of this new series of workshopslies in the reinforcement of the policy-level component targetingdecision-makers, as well as in the introduction of a hands-on sessionfor data holders and potential contributors.Fortnightly web conferences with the EC have continued to take place inorder to discuss topics such as the involvement of DGT in ELRCactivities and the organisation of the ELRC conference (T7) to be heldbefore the end of 2017.

_*
European Language Resource Coordination +3 (ELRC+3) *_

ELRC+3 counts on the ELRC Consortium Members as partners of the presentaction: Tilde (coordinator - Latvia), ELDA (France), DFKI (Germany) andILSP (Greece). The main objective of ELRC+3 is to continue the ELRC'songoing work in helping the EC obtain resources for the training andoptimization of the CEF Automated Translation platform, for the CEFlanguages, and in domains of interest to the CEF Digital ServiceInfrastructures (DSIs). For that purpose, this action aims to identify,collect, clear, produce, process and make available further resources tothe EC.

 
In this context, ELDA will be leadering the following activities:

• Adaptation of the existing ELRC database of sources, revising andcustomising it for the new needs and requirements.• Identification of licensing conditions and right holder(s) for thenew resources.

•    Dissemination activities, also in support of the ELRC+2 action.

• Anonymisation of language resource databases: this will depend onthe requirements of the language resource stakeholders and regulationson personal data protection.• Validation of language resources and their metadata, which impliesthe quality evaluation of each deliverable language resource (bothmonolingual and parallel).• Clearing of IPRs and other legal issues that may arise for the datacollected.In February, the Inception report has been reviewed by the EuropeanCommission (EC) during February and its final version is underpreparation for March, following EC's recommendations. In the meantime,work has started for the different tasks, in particular concerning theidentification and processing of an initial batch of language resources.With regard to dissemination, the ELRC website is going to be enhancedso as to welcome the needs of the new ELRC+2 and ELRC+3 projects.Discussion has also started on the new on-site assistance instrumentthat is defined to take place within the project. This assistance isintended to go beyond that currently offered within ELRC, supportingdata owners with their technical questions related to data processingand provision.

_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

[Mt-list] ELRA Members' News - February 2017

Reply via email to