LeWiDi: Shared task on Learning With Disagreement

We'd like to invite researchers in disagreement and variation  to participate 
in the third edition of the LeWidi shared tasks held in conjunction with the 
NLPerspectives workshop at the EMNLP conference in Suzhou, China.
The LeWiDi series is positioned within the growing body of research that 
questions the practice of label harmonization and the reliance on a single 
ground truth in AI and NLP. This year's shared task challenges participants to 
leverage both instance-level disagreement and annotator-level information in 
classification. The proposed tasks include ones that address disagreement in 
both generation and labeling—with a dataset for Natural Language Inference 
(NLI) and another for paraphrase detection—as well as subjective tasks, 
including irony and sarcasm detection.
==== Subtasks and datasets ====

Participants will be able to submit to subtasks exploring different types of 
disagreement through dedicated datasets:
1. The Conversational Sarcasm corpus (CSC) – a dataset of context+response 
pairs rated for sarcasm, with ratings from 1 to 6.
2. The MultiPico dataset (MP) – a crowdsourced multilingual irony detection 
dataset. Annotators were tasked to detect whether a reply was ironic in the 
context of a brief post-reply exchange on social media. Annotators ids and 
metadata (gender, age, nationality, etc) are available. Languages include 
Arabic, German, English, Spanish, French, Hindi, Italian, Dutch, and Portuguese.
3. The Paraphrase dataset (Par) – a dataset of question pairs for which the 
annotators had to tell whether the two questions are paraphrases of each other, 
using values on a Likert scale.
4. TheVariErrNLI dataset (VariErrNLI) – a dataset originally designed for 
automatic error detection, distinguishing between annotation errors and 
legitimate human label variations in Natural Language Inference.
Participants will be able to submit to one or multiple datasets.
==== Tasks and Evaluation ====
In this edition, only soft evaluation metrics will be used. We will however 
experiment with two forms of tasks and evaluation:

  *   TASK A (SOFT LABEL PREDICTION): Systems will be asked to output a 
probability distribution of the values. EVALUATION: the distance between this 
predicted soft label and that resulting from human annotations will be computed.
  *   TASK B (PERSPECTIVIST PREDICTION): Systems will be asked to predict each 
annotator's label on items. EVALUATION: a measure of correctness of the 
predictions

Participants will be able to submit to one or both tasks.
==== Important Dates ====
Training data ready         May 15th 2025
Evaluation Starts            June 20th 2025
Evaluation Ends             July 15th 2025
Paper submission due    TBA
Notification to authors:   TBA
NLPerspectives workshop:       November 12-14, 2025
We are looking forward to your submission!
The LeWidi team
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to