-----------------------------------------------------
Shared Task on Multilingual Counterspeech Generation
-----------------------------------------------------

*WE HAVE EXTENDED THE DEADLINES!*
(You can find the new deadlines in the "Important Dates" section of this CFP)

In addition to paper contributions, we are organizing a shared task on 
multilingual counterspeech generation with the aim of sharing in a central 
space current efforts, especially those for languages different to English.
It is envisaged that the shared task would allow the community to study how we 
can improve counterspeech generation for both lower resource languages but also 
to reinforce the strong body of research already existing for English.
The counterspeech generated by participants should be respectful, 
non-offensive, and contain information that is specific and truthful with 
respect to the following targets: Jews, LGBT+, immigrants, people of color, 
women.

Data
---------------------
 We release new data consisting of 596 Hate Speech-Counter Narrative (HS-CN) 
pairs. In this dataset, the HS are taken from MTCONAN 
[https://github.com/marcoguerini/CONAN/tree/master/Multitarget-CONAN], while 
the CN are newly generated. Together with each HS-CN pair, we also provide 5 
background knowledge sentences, some of which are relevant for obtaining the 
Counter Narratives. The dataset is available in 4 different languages (Basque, 
English, Italian and Spanish) and divided in the following splits:
 - Development: 100 pairs. [AVAILABLE NOW!] 
[https://huggingface.co/datasets/LanD-FBK/ML_MTCONAN_KN]
 - Train: 396 pairs [AVAILABLE NOW!] 
[https://huggingface.co/datasets/LanD-FBK/ML_MTCONAN_KN]
 - Test: 100 pairs [AVAILABLE ON 28th OCTOBER] 

 In order to score the shared task participants, the CNs will be kept hidden 
during the shared task while the HS and the background knowledge will be 
released for participants to prepare their submissions.

 The languages, Basque, English, Italian and Spanish, offer a varied spectrum 
of complexity including an agglutinative language isolate (Basque), two romance 
languages (Italian, Spanish) and a Germanic one (English). The choice of 
languages obeys to the existing linguistic expertise among the organizers 
required to successfully run the shared task.

Additional Multilingual Data
---------------------
 Participants also have available the English, Italian, Basque and Spanish 
CONAN manually curated data:
 - CONAN (English and Italian) 
[https://github.com/marcoguerini/CONAN/tree/master/CONAN]
 - CONAN (Basque and Spanish) [https://huggingface.co/datasets/HiTZ/CONAN-EUS]
 - CONAN-MT-SP (Spanish) [https://github.com/sinai-uja/CONAN-MT-SP] or 
[https://huggingface.co/datasets/SINAI/CONAN-MT-SP]
 - Multitarget-CONAN (English) 
[https://github.com/marcoguerini/CONAN/tree/master/Multitarget-CONAN]

Participation
---------------------
 The aim of the shared task is, given a HS (and optionally any additional 
knowledge the participants may like to use), generate a CN to counteract the HS.
An example of CN appears bellow: 
 - English:
   + HS: We should ban all homosexuals.
   + CN: When will the love prosper and the hatred start to dissipate? I will 
not only respect my fellow LGBT+ people, I will promote their rights.
 - Italian:
   + HS: Dovremmo espellere tutti gli omosessuali.
   + CN: Quand'è che l'amore prospererà e l'odio inizierà a dissiparsi? Non 
solo rispetterò i miei compagni LGBT+, ma promuoverò i loro diritti.
 - Spanish:
   + HS: Deberíamos prohibir a todos los homosexuales.
   + CN: ¿Cuándo prosperará el amor y empezará a disiparse el odio? No sólo 
respetaré a mis compañeros LGBT+, sino que promoveré sus derechos.
 - Basque:
   + HS: Homosexual guztiak debekatu beharko genituzke.
   + CN: Noiz hasiko da maitasuna irabazten eta gorrotoa desagertzen? LGBT+ 
pertsonak errespetatzeaz gain, haien eskubideak sustatuko ditut.
 
Participants will download the test HS for the 4 languages and generate at most 
three different CNs per HS for each language). The test window will last 5 
days.     
Participants are allowed to use any resource (language model, data, etc.) to 
generate the CN.

**Note:** If you are going to participate in the shared task, please fill the 
following form: 
[https://docs.google.com/forms/d/e/1FAIpQLSeAZTJsrEXt35HfFFchPNdPi289q5kKerqcaKnyLTw-8ONYJw/viewform?usp=sf_link]
 

Evaluation
---------------------
The CNs submitted by the participants will be evaluated:
 - Using traditional automatic metrics as in Tekiroglu et al.( 2022), which 
include BLEU, ROUGE, Novelty and Repetition Rate.
 - Using LLM as a Judge following the approach described in this paper:  
https://arxiv.org/abs/2406.15227 

Important Dates
---------------------
 - Test dataset release: October 28th, 2024
 - Results submission: November 4th, 2024
 - Results notification: November 15th, 2024
 - Working papers submission: November 25th, 2024
 - Notification of Acceptance: December 8th, 2024
 - Camera-Ready Papers Due: December 13th, 2024
 - Workshop: January 19th, 2025

-----------------------------------------------------
Workshop on Multilingual Counterspeech Generation
-----------------------------------------------------
The Shared Task is associated to the First Workshop on Multilingual 
Counterspeech Generation at COLING 2025.
---------------------
Background and Scope
---------------------
While interest in automatic approaches to Counterspeech generation has been 
steadily growing,
including studies on data curation (Chung et al., 2019a; Fanton et al., 2021), 
detection (Chung
et al., 2021a; Mathew et al., 2018), and generation (Tekiroglu et al., 2020; 
Chung et al., 2021b;
Zhu and Bhat, 2021; Tekiroglu et al., 2022), the large majority of the 
published experimental work on automatic Counterspeech generation has been 
carried out for English. This is due to the scarcity of both non-English 
manually curated training data and to the crushing predominance of English in 
the generative Large Language Models (LLMs) ecosystem. A workshop on exploring 
Multilingual Counterspeech Generation is proposed to promote and encourage 
research on multilingual approaches for this challenging topic.

Thus, this workshop aims to test monolingual and multilingual LLMs in 
particular and Language Technology in general to automatically generate 
counterspeech not only in English but also in languages with fewer resources. 
In this sense, an important goal of the workshop will be to understand the 
impact of using LLMs, considering for example how to deal with pressing issues 
such as biases, hallucinated content, data scarcity or data contamination.

We seek to maximize the scientific and social impact of this workshop by 
promoting the
creation of a community of researchers from diverse fields, such as computer 
and social sciences, as well as policy makers and other stakeholders interested 
in automatic counterspeech generation. By doing so we aim to gain a deeper 
understanding of how counterspeech is currently used to tackle abuse by 
individuals, activists, and organizations
and how Natural Language Processing (NLP) and Generation (NLG) may be best 
applied to counteract it.

Call for Papers
---------------------
We welcome submissions on the following topics (but not limited to):
 - Models and methods for generating counterspeech in different languages.
 - Automatic Counterspeech generation for low resource languages with scarce 
training data.
 - Dialogue agents that use counterspeech to combat offensive messages that are 
directed to individuals or groups, targeted based on various aspects such as 
ideology, gender, sexual orientation and religion.
 - Methods for human and automatic evaluation of counterspeech. 
 - Multidisciplinary studies providing different perspectives on the topic such 
as computer science, social science, psychology, etc.
 - Development of taxonomies and quality datasets for counterspeech in multiple 
languages.
 - Potentials and limitations (e.g., fairness, biases, hallucinated content) of 
applying different NLP methods, such as LLMs, to generate counterspeech.
 - Social impact and empirical studies of counterspeech in social networks, 
including research on the effectiveness and consequences for users of using 
counterspeech to combat hate online.

Submission
---------------------
We welcome two types of papers: regular workshop papers and non-archival 
submissions. Regular workshop papers will be included in the workshop 
proceedings. All submissions must be in PDF format and made through START  
[https://softconf.com/coling2025/MCG25/]
 - Regular workshop papers: Authors can submit papers up to 8 pages, with 
unlimited pages for references. Authors may submit up to 100 MB of 
supplementary materials separately and their code for reproducibility. All 
submissions undergo an double-blind single-track review. Accepted papers will 
be presented as posters with the possibility of oral presentations.
 - Non-archival submissions: Cross-submissions are welcome. Accepted papers in 
other venues or journals will be presented at the workshop, but will not be 
included in the workshop proceedings. Papers must be in PDF format and will be 
reviewed in a double-blind fashion by workshop reviewers. We also welcome 
extended abstracts (up to 2 pages) of papers that are work in progress, under 
review or to be submitted to other venues. Papers in this category need to 
follow the COLING format.

Important Dates
---------------------
 - Submission: November 20th, 2024
 - Notification of Acceptance: December 2nd, 2024
 - Camera-Ready Papers Due: December 10th, 2024


For more information you can join the Google group 
[https://groups.google.com/g/multilingual-cs-generation-coling2025] or visit 
our website [https://sites.google.com/view/multilang-counterspeech-gen/home] 
[https://sites.google.com/view/multilang-counterspeech-gen/shared-task]
Best regards,
The Multilingual Counterspeech Generation Workshop Organizers.
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to