Dear colleagues,

The fourth iteration of the Generation, Evaluation & Metrics (GEM) Workshop
<https://gem-benchmark.com/workshop> will be held as part of ACL
<https://2025.aclweb.org/>, July 27–August 1st, 2025. This year we’re
planning a major upgrade to the workshop, which we dub GEM2, through the
introduction of a  large dataset of 1B model predictions together with
prompts and gold standard references, encouraging researchers from all
backgrounds to submit work on meaningful, efficient and robust evaluation
of LLMs.
Overview

Evaluating large language models (LLMs) is challenging. Running LLMs over
medium or large scale corpus can be prohibitively expensive; they are
consistently shown to be highly sensitive to prompt phrasing, and it is
hard to formulate metrics which differentiate and rank different LLMs in a
meaningful way. Consequently, the validity of the results obtained over
popular benchmarks such as HELM or MMLU, lead to brittle conclusions. We
believe that meaningful, efficient, and robust evaluation is one of the
cornerstones of the scientific method, and that achieving it should be a
community-wide goal. In this workshop we seek innovative research relating
to the evaluation of LLMs and language generation systems in general. We
welcome submissions related, but not limited to, the following topics:

   -

   Automatic evaluation of generation systems.
   -

   Creating evaluation corpora and challenge sets.
   -

   Critiques of benchmarking efforts and responsibly measuring progress in
   LLMs.
   -

   Effective and/or efficient NLG methods that can be applied to a wide
   range of languages and/or scenarios.
   -

   Application and evaluation of LLMs interacting with external data and
   tools.
   -

   Evaluation of sociotechnical systems employing large language models.
   -

   Standardizing human evaluation and making it more robust.
   -

   In-depth analyses of outputs of existing systems, for example through
   error analyses, by applying new metrics, or by testing the system on new
   test sets.


Following the success of last iterations, GEM2 will also hold an Industrial
Track, which aims to provide actionable insights to industry professionals
and to foster collaborations between academia and industry. This track will
address the unique challenges faced by non-academic colleagues,
highlighting the differences in evaluation practices between academic and
industrial research, and explore the challenges in evaluating generative
models with real-world data. The Industrial Track invites submissions
covering the following topics, including (but not limited to):

   -

   Breaking Barriers: Bridging the Gap between Academic and Industrial
   Research.
   -

   From Data Diversity to Model Robustness: Challenges in Evaluating
   Generative Models with Real-World Data.
   -

   Beyond Metrics: Evaluating Generative Models for Real-World Business
   Impact.

How to submit?

Submissions can take either of the following forms:

   -

   Archival Papers describing original and unpublished work can be
   submitted in a between 4 and 8 page format.
   -

   Non-Archival Abstracts To discuss work already presented or under review
   at a peer-reviewed venue, we allow the submission of 2-page abstracts.


Papers should be submitted directly through OpenReview
<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/GEM&referrer=%5BHomepage%5D(%2F)>,
selecting the appropriate track, and conform to ACL 2025 style guidelines
<https://2025.aclweb.org/calls/main_conference_papers/#paper-submission-details>.
We additionally welcome presentations by authors of papers in the Findings
of the ACL. The selection process is managed centrally by the workshop
chairs for the conference and we thus cannot respond to individual
inquiries about Findings papers. However, we will try our best to
accommodate authors’ requests.
Important Dates

   -

   April 11: Direct paper submission deadline (ARR).
   -

   May 5: Pre-reviewed (ARR) commitment deadline.
   -

   May 19: Notification of acceptance.
   -

   June 6: Camera-ready paper deadline.
   -

   July 7: Pre-recorded videos due.
   -

   July 31 - August 1: Workshop at ACL in Vienna.

ContactFor any questions, please check the workshop page
<https://gem-benchmark.com/workshop> or email the organisers:
[email protected].

best,
simon

*ADAPT Research Centre / Ionaid Taighde ADAPT*
*School of Computing, Dublin City University, Glasnevin Campus
/ Scoil na Ríomhaireachta,
Campas Ghlas Naíon, Ollscoil Chathair Bhaile Átha Cliath*
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to