================================================

FIRST CALL FOR PARTICIPATION

SemEval 2022 Task 2 Multilingual Idiomaticity Detection and Sentence
Embedding

https://sites.google.com/view/semeval2022task2-idiomaticity

We are excited to announce the SemEval 2022 Task seeking to encourage the
development of methods aimed at better identification and representation of
Idiomatic Multiword Expressions (MWEs).

Motivation
================================================

By and large, the use of compositionality of word representations has been
successful in capturing the meaning of sentences. However, there is an
important set of phrases - those which are idiomatic - which are inherently
not compositional. Early attempts to represent idiomatic phrases in
non-contextual embeddings involved  the  extraction  of  frequently
 occurring  n-grams from text (such as “big fish”) before learning
representations of the phrase based on their context. However, the
effectiveness of this method drops off significantly as the length of the
idiomatic phrase increases as a result of data sparsity. More recent
studies show that even state-of-the-art pre-trained contextual models (e.g.
BERT) cannot accurately represent idiomatic expressions.

Task Overview
================================================
Given this shortcoming in existing state-of-the-art models, this task (part
of SemEval 2022) is aimed at detecting and representing multiword
expressions (MWEs) which are potentially idiomatic phrases  across
 English,  Portuguese  and Galician in both zero shot and one-shot shot
settings.

Participants have the freedom to choose a subset of subtasks or variations
(settings) that they'd like to participate in. You can NOT pick a subset of
languages.

This task consists of two subtasks:

*Subtask A  *
------------------------------------------------
A binary classification task aimed at determining whether a sentence
contains an idiomatic expression.

*Subtask B *
------------------------------------------------
The task of generating a sentence embedding that represents the correct
meaning of the sentence, be it idiomatic or literal, as measured by
semantic text similarity between sentences.


Important Dates
================================================
Training data available: September 3, 2021 [NOW AVAILABLE]

Evaluation start: January 10, 2022

Evaluation end: (TBC) January 31, 2022

Paper submissions due: (TBC) February 23, 2022

Notification to authors: March 31, 2022

Organisation
================================================
Harish Tayyar Madabushi, University of Sheffield, UK.
Edward Gow-Smith, University of Sheffield, UK.
Marcos Garcia, Universidade de Santiago de Compostela, Spain
Carolina Scarton, University of Sheffield, UK.
Marco Idiart, Federal University of Rio Grande do Sul, Brazil.
Aline Villavicencio, University of Sheffield, UK.


For more information, see:
https://sites.google.com/view/semeval2022task2-idiomaticity


Best wishes,
Harish

-- 
*Carolina Scarton*
Academic Fellow
Department of Computer Science
University of Sheffield
http://staffwww.dcs.shef.ac.uk/people/C.Scarton/
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to