[Corpora-List] First CfP: First Workshop on Data Contamination (CONDA) @ ACL 2024

Eneko Agirre via Corpora Fri, 26 Jan 2024 07:31:26 -0800

We invite you to participate and submit your work to the First Workshopon Data Contamination (CONDA) co-located with ACL 2024 in Bangkok, Thailand.

Data contamination, where evaluation data is inadvertently included inpre-training corpora of large scale models, and language models (LMs) inparticular, has become a concern in recent times. The growing scale ofboth models and data, coupled with massive web crawling, has led to theinclusion of segments from evaluation benchmarks in the pre-trainingdata of LMs. The scale of internet data makes it difficult to preventthis contamination from happening, or even detect when it has happened.Crucially, when evaluation data becomes part of pre-training data, itintroduces biases and can artificially inflate the performance of LMs onspecific tasks or benchmarks. This poses a challenge for fair andunbiased evaluation of models, as their performance may not accuratelyreflect their generalization capabilities.

Although a growing number of papers and state-of-the-art models mentionissues of data contamination, there is no agreed-upon definition orstandard methodology to ensure that a model does not report results oncontaminated benchmarks. Addressing data contamination is a sharedresponsibility among researchers, developers, and the broader community.By adopting best practices, increasing transparency, documentingvulnerabilities, and conducting thorough evaluations, we can worktowards minimizing the impact of data contamination and ensuring fairand reliable evaluations.

We welcome paper submissions on all topics related to datacontamination, including but not limited to:


 * Definitions, taxonomies, and gradings of contamination
 * Contamination detection (both manual and automatic)
 * Community efforts to discover, report, and organize contamination events
 * Documentation frameworks for datasets or models
 * Methods to avoid data contamination
 * Methods to forget contaminated data
 * Scaling laws and contamination
 * Memorization and contamination
 * Policies to avoid impact of contamination in publication venues and
   open source communities
 * Reproducing and attributing results from previous work to data
   contamination
 * Survey work on data contamination research
 * Data contamination in other modalities

*Submission Instructions*

We welcome two types of papers: regular workshop papers and non-archivalsubmissions. Regular workshop papers will be included in the workshopproceedings. All submissions must be in PDF format and made throughOpenReview<https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CONDA>.


 * *Regular workshop papers:* Authors can submit papers up to 8 pages,
   with unlimited pages for references. Authors may submit up to 100 MB
   of supplementary materials separately and their code for
   reproducibility. All submissions undergo an double-blind
   single-track review. Best Paper Award(s) will be given based on
   nomination by the reviewers. Accepted papers will be presented as
   posters with the possibility of oral presentations.
 * *Non-archival submissions:* Cross-submissions are welcome. Accepted
   papers will be presented at the workshop, but will not be included
   in the workshop proceedings. Papers must be in PDF format and will
   be reviewed in a double-blind fashion by workshop reviewers. We also
   welcome extended abstracts (up to 2 pages) of papers that are work
   in progress, under review or to be submitted to other venues. Papers
   in this category need to follow the ACL format.

In addition to papers submitted directly to the workshop, which will bereviewed by our Programme Committee. We also accept papers reviewedthrough ACL Rolling Review and committed to the workshop. Please, checkthe relevant dates for each type of submission.


*Important dates*
Relevant deadlines to consider when submitting your paper are:

 * Paper submission deadline: May 17 (Friday), 2024
 * ARR pre-reviewed commitment deadline: TBD, 2024
 * Notification of acceptance: June 17 (Monday), 2024
 * Camera-ready paper due: July 1 (Monday), 2024
 * Workshop date: August 16, 2024

*Contact*

 * *Website:* https://conda-workshop.github.io/
 * *Contact:* [email protected]
   <mailto:[email protected]>

*Workshop organizers*
Oscar Sainz, University of the Basque Country (UPV/EHU)
Iker García Ferrero, University of the Basque Country (UPV/EHU)
Eneko Agirre, University of the Basque Country (UPV/EHU)
Jon Ander Campos, Cohere
Alon Jacovi, Bar Ilan University

Yanai Elazar, Allen Institute for Artificial Intelligence and Universityof WashingtonYoav Goldberg, Bar Ilan University and Allen Institute for ArtificialIntelligence

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] First CfP: First Workshop on Data Contamination (CONDA) @ ACL 2024

Reply via email to