Dear colleagues,


You are invited to participate in the 5th Workshop on Scholarly Document
Processing (SDP 2025) to be held at ACL 2025 in Vienna, Austria. SDP 2025
will consist of a research track and five shared tasks. The call for
research papers is described below, and more details can be found on our
website, https://sdproc.org/2025/.



Papers must follow the *ACL format* and conform to the *ACL 2025 Submission
Guidelines*. Paper submission has to be done through OpenReview:
https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc



   - Website: https://sdproc.org/2025/
   - Submission site: <
   https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc>
   - X (Twitter): https://twitter.com/sdpworkshop
   - Shared tasks: https://sdproc.org/2025/sharedtasks.html
   - *Paper submission deadline: March 1 (Saturday), 2025*

Call for Research Papers

Scholarly literature is the chief means by which scientists and academics
document and communicate their results and is therefore critical to the
advancement of knowledge and improvement of human well-being. At the same
time, this literature poses challenges to NLP uncommon in other genres,
such as specialized language and high background knowledge requirements,
long documents and strong structural conventions, multimodal presentation,
citation relationships among documents, an emphasis on rational
argumentation, and the frequent availability of detailed metadata and
experimental data. These challenges necessitate the development of NLP
methods and resources optimized for this domain. The Scholarly Document
Processing (SDP) workshop provides a venue for discussing these challenges,
bringing together stakeholders from different communities including
computational linguistics, machine learning, text mining, information
retrieval, digital libraries, scientometrics and others, to develop
methods, tasks, and resources in support of these goals.



This workshop builds on the success of prior workshops: SDP workshops held
at EMNLP 2020, NAACL 2021, COLING 2022, and ACL 2024, and the 1st and 2nd
SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad
appeal within the NLP community, we hope the SDP workshop will attract
researchers from other relevant fields including meta-science,
scientometrics, data mining, information retrieval, and digital libraries,
bringing together these disparate communities within ACL.
Topics of Interest

We invite submissions from all communities demonstrating usage of and
challenges associated with natural language processing, information
retrieval, and data mining of scholarly and scientific documents. Relevant
topics include (but are not limited to):



   - Large Language Models (LLMs) for science
   - Representation learning and language modeling
   - Information extraction and NER
   - Document understanding
   - Summarization and generation
   - Question-answering
   - Discourse modeling/argumentation mining
   - Network analysis
   - Bibliometrics, scientometrics, and altmetrics
   - Reproducibility and research integrity, including new challenges posed
   by generative AI
   - Peer review tools, principles and technology
   - Metadata and indexing
   - Inclusion of datasets and computational resources
   - Research infrastructures and digital libraries
   - Increasing the representation in scholarly work of disadvantaged
   populations
   - LLM-based interfaces to consume/produce scholarly documents
   - Impact of scholarly communication on popular discourse

Submission Information

Authors are invited to submit full and short papers with unpublished,
original work. Submissions will be subject to a double-blind peer-review
process. Accepted papers will be presented by the authors at the workshop
either as a talk or a poster. All accepted papers will be published in the
workshop proceedings (proceedings from previous years can be found here:
https://aclanthology.org/venues/sdp/), which will be published in the ACL
Anthology.



The submissions must be in PDF format and anonymized for review. All
submissions must be written in English and follow the ACL 2025 formatting
requirements:



*Long paper submissions:* up to 8 pages of content, plus unlimited
references.

*Short paper submissions:* up to 4 pages of content, plus unlimited
references.



*Submission Website:* Paper submission has to be done through openreview:

<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc>



Final versions of accepted papers will be allowed 1 additional page of
content so that reviewer comments can be taken into account.
Important Dates (Main Research Track)

   - First call for workshop papers: December 19, 2024
   - Second call for workshop papers: February 21, 2025
   - *Paper submission deadline: March 1, 2025*
   - Pre-reviewed (ARR) submission deadline: March 25, 2025
   - Notification of acceptance: April 17, 2025
   - Camera-ready paper due: May 16, 2025
   - Workshop dates: July 31 – August 1, 2025

Note: Shared task submission deadlines and other important dates to be
announced.
SDP 2024 Keynote Speakers

We are excited to have several keynote speakers at SDP 2025.



   1. *Tom Hope* <https://tomhoper.github.io/>, Assistant Professor at
   Hebrew University of Jerusalem and Research Scientist at Allen Institute
   for AI.
   2. *James A. Evans*
   <https://sociology.uchicago.edu/directory/James-A-Evans>, Professor and
   Director of the Knowledge Lab at University of Chicago and External
   Professor at the Santa Fe Institute.
   3. *TBA*

SDP 2025 Shared Tasks

SDP 2025 will host six exciting shared tasks. More information about all
shared tasks is provided on the workshop website:
https://sdproc.org/2025/sharedtasks.html



*Detecting automatically generated scientific papers (DAGPap 25)*

A big problem with the ubiquity of Generative AI is that it has now become
very easy to generate fake scientific papers. This can erode public trust
in science and attack the foundations of science: are we standing on the
shoulders of robots? The Detecting Automatically Generated Papers (DAGPAP)
competition aims to encourage the development of robust, reliable
AI-generated scientific text detection systems, utilizing a diverse dataset
and varied machine learning models in a number of scientific domains.

Organizers: Savvas Chamezopoulos, Dan Li, Anita de Waard (Elsevier).



*Contextualizing Scientific Figures and Tables (Context 25)*

Interpreting scientific claims in the context of empirical findings is a
valuable practice, yet extremely time-consuming for researchers. Such
interpretation requires identifying key results (often captured in tables
and figures) that provide supporting evidence from research papers, and
contextualizing these results with associated methodological details (e.g.,
measures, sample, etc.). During the previous version of this shared task in
2024, we released datasets to support the development of methods for
automatic identification of key result figures or tables as well as
additional grounding context to make claim interpretation more efficient.
However, the released datasets contained tables and images already
extracted from the scientific papers to allow participants to bypass PDF
pre-processing issues. In Context 2025, given recent advances in multimodal
LLMs, we plan to extend the difficulty of this task by requiring
participants to identify key results from paper PDFs directly, and add a
new sub-task on multi-hop reasoning over scientific evidence.

Organizers: Joel Chan, Matthew Akamatsu, Aakanksha Naik



*Scientific Visual Question Answering (SciVQA)*

Scholarly articles convey valuable information not only through
unstructured text but also via (semi-)structured figures such as charts and
diagrams. Automatically interpreting the semantics of knowledge encoded in
these figures can be beneficial for downstream tasks such as question
answering (QA). In the SciVQA challenge, the participants will develop
multimodal systems capable of efficiently processing both visual (i.e.,
addressing attributes such as colour, shape, size, etc.) and non-visual QA
pairs based on images of scientific figures and their captions.

Organizers: Ekaterina Borisova, Georg Rehm



*Scientific Fact-checking of Social Media Posts on Climate Change
(ClimateCheck)*

The ClimateCheck shared task focuses on fact-checking claims from social
media about climate change against peer-reviewed scholarly articles.
Participants will retrieve relevant publications from a corpus of 400,000
climate research articles and classify each abstract as supporting,
refuting, or not having enough information about the claim. Training data
will include human-annotated claim-publication pairs, and the evaluation
will combine nDCG@K and Bpref for retrieval and F1 score for
classification. The task aims to develop models that link social media
claims to scientific evidence, promoting informed and evidence-based
discussions on climate change.

Organizers: Raia Abu Ahmad, Georg Rehm



*Software Mention Detection in Scholarly Publications (SOMD 25)*

Software plays an essential role in computational research methods and is
considered one of the crucial entities in scholarly documents. However,
software mentions are not always cited in academic documents, resulting in
various informal mentions of software across a paper. Automatic
identification of such software mention contributes to the better
understanding, accessibility, and reproducibility of the research work. In
addition to the mention of software, to understand the research context, it
is necessary to understand the purpose of a software mention and its
attributes, making software mention detection a comprehensive task.

We are extending our first iteration of the shared task SOMD 2024
<https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html> with new
challenges. In addition to information extraction techniques, our extended
focus would be on Joint Named Entity and Relation Classification techniques.

Organizers: Sharmila Upadhyaya, Frank Krueger, Stefan Dietze



*SciHal2025: Hallucination Detection for Scientific Content*

Generative AI-enhanced academic research assistants are transforming how
research is conducted. By allowing users to pose research-related questions
in natural language, these systems can generate structured and concise
summaries supported by relevant references. However, hallucinations —
unsupported claims introduced by large language models — remain a
significant obstacle to fully trusting these automatically generated
scientific answers.




Organizing Committee

   - Tirthankar Ghosal, Oak Ridge National Laboratory, USA
   - Philipp Mayr, GESIS – Leibniz Institute for the Social Sciences,
   Germany
   - Aakanksha Naik, Allen Institute for AI, USA
   - Amanpreet Singh, Allen Institute for AI, USA
   - Anita de Waard, Elsevier, Netherlands
   - Dayne Freitag, SRI International, USA
   - Georg Rehm, German Research Center for Artificial Intelligence (DFKI),
   Germany
   - Sonja Schimmler, Fraunhofer FOKUS, Germany
   - Dan Li, Elsevier, Netherlands
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to