Shared task on automatic identification of verbal multiword expressions



*Apologies for cross-posting*

The PARSEME <http://parseme.eu>shared task on automatic identification of verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in running texts. Verbal MWEs include idioms (*let the cat out of the bag*), light verb constructions (*make a decision*), verb-particle constructions (*give up*), and inherently reflexive verbs (*se suicider*'to suicide' in French). Their identification is a well-known challenge for NLP applications, due to their complex characteristics: discontinuity, non-compositionality, heterogeneity and syntactic variability.

The shared task is highly multilingual: we intend to cover more than 20 languages from as many countries. PARSEME members have elaborated annotation guidelines based on annotation experiments in 16 languages from several language families. These guidelines take both universal and language-specific phenomena into account. We hope that this will boost the development of language-independent and cross-lingual VMWE identification systems.

Participation is open worldwide. We will provide two corpora to the 

* Training corpora in all languages, annotated manually for VMWEs according to common guidelines. This dataset will be sent to the participants in advance in order to allow them to adapt their systems.

* Raw (unannotated) test corpora in all languages, to be used as input to the systems. The annotations contained in this corpus, performed according to the same guidelines, will be kept secret.

Participants will provide the output produced by their systems on the test corpus. This output will be compared with the gold standard (ground truth). Evaluation metrics are precision, recall and F1, both strict and fuzzy.



Potential participant teams should register using the form below:


Task updates and questions will be posted to our public mailing list:


A sample training file in English is provided here:


A sample input file to be annotated is provided here:


More details on the annotation of the corpora can be found here:


Publication and workshop


Participants will be invited to submit a system description paper. We intend to organize a dedicated shared task session as part of the annual MWE workshop. We have applied for a collocated event with EACL 2017 <http://eacl2017.org>, to be held in Valencia, Spain from April 3 to 7.

Important dates^1


 * Oct 14, 2016: first Call for Participation

 * Nov 15, 2016: second Call for Participation

 * Dec 13, 2016: trial data and evaluation script released

 * Jan 6, 2016: training data released

 * Jan 10, 2017: final Call for Participation

 * Jan 20, 2017: blind test data released

 * Jan 27, 2017: submission of system results

 * Jan 30, 2017: announcement of results

 * Feb 5, 2017: submission of shared task system description papers

 * Feb 12, 2017: notification of acceptance

 * Feb 19, 2017: camera-ready system description papers due

 * April 2017: shared task workshop

1: Dates conditioned by the acceptance of the workshop proposal at EACL 2017.

Organizing team


Agata Savary, Veronika Vincze, Antoine Doucet, Federico Sangati, Behrang QasemiZadeh, Marie Candito, Fabienne Cap, Silvio Cordeiro, Voula Giouli, Carlos Ramisch, Ivelina Stoyanova.
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list

Reply via email to