Welcome to SHROOM, a Shared-task on Hallucinations and Related Observable 
Overgeneration Mistakes!


Task description: SHROOM participants will need to detect grammatically sound 
output that contains incorrect semantic information (i.e. unsupported or 
inconsistent with the source input), with or without having access to the model 
that produced the output.


Overview of the task: The modern NLG landscape is plagued by two interlinked 
problems:

On the one hand, our current neural models have a propensity to produce 
inaccurate but fluent outputs; on the other hand, our metrics are most apt at 
describing fluency, rather than correctness. This leads neural networks to 
“hallucinate”, i.e., produce fluent but incorrect outputs that we currently 
struggle to detect automatically. For many NLG applications, the correctness of 
an output is however mission critical. For instance, producing a 
plausible-sounding translation that is inconsistent with the source text puts 
in jeopardy the usefulness of a machine translation pipeline. With our shared 
task, we hope to foster the growing interest in this topic in the community.


With SHROOM we adopt a post hoc setting, where models have already been trained 
and outputs already produced: participants will be asked to perform binary 
classification to identify cases of fluent overgeneration hallucinations in two 
different tracks:  a model-aware and  a model-agnostic track. In the former, 
participants have access to the model that produced the output; in the latter, 
they do not. To ensure a low-barrier to entry, we format the task as a binary 
classification problem. We now also provide a baseline kit, containing a 
baseline system, a format checker and the scoring program.

All systems will be rated on accuracy (i.e., the proportion of test examples 
correctly labeled) and calibration (i.e., the correlation between the 
probability assigned by a system and the proportion of annotators marking a 
production as hallucinatory).


We provide to participants a collection of checkpoints, inputs, references and 
outputs of systems covering three NLG tasks: definition modeling (DM), machine 
translation (MT), and paraphrase generation (PG), trained with varying degrees 
of accuracy. The development set provides binary annotations from five 
different annotators and a majority vote gold label.


Anyone wishing to participate in the task is welcome! Participants will have to

  *   Submit at least once during the evaluation phase on January;

  *   Write a system description paper before February 19;

  *   Review other system description papers (max. 2).


Trial, dev and train data are now available on the task website:
https://helsinki-nlp.github.io/shroom/

Codalab competition: https://codalab.lisn.upsaclay.fr/competitions/15726

Join the mailing group: 
https://groups.google.com/u/1/g/semeval-2024-task-6-shroom

Updates on Twitter: @shroom2024<https://twitter.com/shroom2024>


Important dates:

  *   Sample data ready: July 15th, 2023

  *   Validation data ready: September 11th, 2023

  *   Unlabeled train data ready: September 22nd, 2023

  *   Evaluation period starts (test set released): January 10th, 2024

  *   Evaluation period ends: January 31st, 2024

  *   Workshop paper submission deadline: February 19th, 2024

  *   Notification to authors: March 18th, 2024

  *   SemEval workshop: 16–21 June, Mexico (collocated with NAACL 2024)



Task organizers

  *   Elaine Zosa, Silo AI, Finland

  *   Raúl Vázquez, University of Helsinki, Finland

  *   Jörg Tiedemann, University of Helsinki, Finland

  *   Vincent Segonne, Southern Brittany University, France

  *   Teemu Vahtola, University of Helsinki, Finland

  *   Alessandro Raganato, University of Milano-Bicocca, Italy

  *   Timothee Mickus, University of Helsinki, Finland

  *   Marianna Apidianaki, University of Pennsylvania, USA


_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to