Dear colleagues,
 
You are invited to participate in the 5th Workshop on Scholarly Document 
Processing (SDP 2025) to be held at ACL 2025 in Vienna, Austria. SDP 2025 will 
consist of a research track and five shared tasks. The call for research papers 
is described below, and more details can be found on our website, 
https://sdproc.org/2025/ <https://sdproc.org/2025/>.
 
Papers must follow the ACL format and conform to the ACL 2025 Submission 
Guidelines. Paper submission has to be done through OpenReview: 
https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc 
<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc>
 
Website: https://sdproc.org/2025/ <https://sdproc.org/2025/> 
Submission site: 
<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc 
<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc>> 
X (Twitter): https://twitter.com/sdpworkshop <https://twitter.com/sdpworkshop> 
Shared tasks: https://sdproc.org/2025/sharedtasks.html 
<https://sdproc.org/2025/sharedtasks.html> 
Paper submission deadline: March 1 (Saturday), 2025
Call for Research Papers
Scholarly literature is the chief means by which scientists and academics 
document and communicate their results and is therefore critical to the 
advancement of knowledge and improvement of human well-being. At the same time, 
this literature poses challenges to NLP uncommon in other genres, such as 
specialized language and high background knowledge requirements, long documents 
and strong structural conventions, multimodal presentation, citation 
relationships among documents, an emphasis on rational argumentation, and the 
frequent availability of detailed metadata and experimental data. These 
challenges necessitate the development of NLP methods and resources optimized 
for this domain. The Scholarly Document Processing (SDP) workshop provides a 
venue for discussing these challenges, bringing together stakeholders from 
different communities including computational linguistics, machine learning, 
text mining, information retrieval, digital libraries, scientometrics and 
others, to develop methods, tasks, and resources in support of these goals.
 
This workshop builds on the success of prior workshops: SDP workshops held at 
EMNLP 2020, NAACL 2021, COLING 2022, and ACL 2024, and the 1st and 2nd SciNLP 
workshops held at AKBC 2020 and 2021. In addition to having broad appeal within 
the NLP community, we hope the SDP workshop will attract researchers from other 
relevant fields including meta-science, scientometrics, data mining, 
information retrieval, and digital libraries, bringing together these disparate 
communities within ACL.

Topics of Interest
We invite submissions from all communities demonstrating usage of and 
challenges associated with natural language processing, information retrieval, 
and data mining of scholarly and scientific documents. Relevant topics include 
(but are not limited to):

Large Language Models (LLMs) for science
Representation learning and language modeling
Information extraction and NER
Document understanding
Summarization and generation
Question-answering
Discourse modeling/argumentation mining
Network analysis
Bibliometrics, scientometrics, and altmetrics
Reproducibility and research integrity, including new challenges posed by 
generative AI
Peer review tools, principles and technology
Metadata and indexing
Inclusion of datasets and computational resources
Research infrastructures and digital libraries
Increasing the representation in scholarly work of disadvantaged populations
LLM-based interfaces to consume/produce scholarly documents
Impact of scholarly communication on popular discourse
Submission Information
Authors are invited to submit full and short papers with unpublished, original 
work. Submissions will be subject to a double-blind peer-review process. 
Accepted papers will be presented by the authors at the workshop either as a 
talk or a poster. All accepted papers will be published in the workshop 
proceedings (proceedings from previous years can be found here: 
https://aclanthology.org/venues/sdp/ <https://aclanthology.org/venues/sdp/>), 
which will be published in the ACL Anthology.
  
The submissions must be in PDF format and anonymized for review. All 
submissions must be written in English and follow the ACL 2025 formatting 
requirements: 
 
Long paper submissions: up to 8 pages of content, plus unlimited references.
Short paper submissions: up to 4 pages of content, plus unlimited references.

Submission Website: Paper submission has to be done through openreview:
<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc 
<https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc>>
 
Final versions of accepted papers will be allowed 1 additional page of content 
so that reviewer comments can be taken into account.
Important Dates (Main Research Track)  
First call for workshop papers: December 19, 2024
Second call for workshop papers: February 21, 2025
Paper submission deadline: March 1, 2025
Pre-reviewed (ARR) submission deadline: March 25, 2025
Notification of acceptance: April 17, 2025
Camera-ready paper due: May 16, 2025
Workshop dates: July 31 – August 1, 2025
Note: Shared task submission deadlines and other important dates to be 
announced.

SDP 2024 Keynote Speakers
We are excited to have several keynote speakers at SDP 2025. 
 
Tom Hope <https://tomhoper.github.io/>, Assistant Professor at Hebrew 
University of Jerusalem and Research Scientist at Allen Institute for AI.
James A. Evans <https://sociology.uchicago.edu/directory/James-A-Evans>, 
Professor and Director of the Knowledge Lab at University of Chicago and 
External Professor at the Santa Fe Institute.
TBA
SDP 2025 Shared Tasks

SDP 2025 will host six exciting shared tasks. More information about all shared 
tasks is provided on the workshop 
website:https://sdproc.org/2025/sharedtasks.html 
<https://sdproc.org/2025/sharedtasks.html> 
 
Detecting automatically generated scientific papers (DAGPap 25)
A big problem with the ubiquity of Generative AI is that it has now become very 
easy to generate fake scientific papers. This can erode public trust in science 
and attack the foundations of science: are we standing on the shoulders of 
robots? The Detecting Automatically Generated Papers (DAGPAP) competition aims 
to encourage the development of robust, reliable AI-generated scientific text 
detection systems, utilizing a diverse dataset and varied machine learning 
models in a number of scientific domains. 
Organizers: Savvas Chamezopoulos, Dan Li, Anita de Waard (Elsevier).
 
Contextualizing Scientific Figures and Tables (Context 25)
Interpreting scientific claims in the context of empirical findings is a 
valuable practice, yet extremely time-consuming for researchers. Such 
interpretation requires identifying key results (often captured in tables and 
figures) that provide supporting evidence from research papers, and 
contextualizing these results with associated methodological details (e.g., 
measures, sample, etc.). During the previous version of this shared task in 
2024, we released datasets to support the development of methods for automatic 
identification of key result figures or tables as well as additional grounding 
context to make claim interpretation more efficient. However, the released 
datasets contained tables and images already extracted from the scientific 
papers to allow participants to bypass PDF pre-processing issues. In Context 
2025, given recent advances in multimodal LLMs, we plan to extend the 
difficulty of this task by requiring participants to identify key results from 
paper PDFs directly, and add a new sub-task on multi-hop reasoning over 
scientific evidence.
Organizers: Joel Chan, Matthew Akamatsu, Aakanksha Naik
 
Scientific Visual Question Answering (SciVQA)
Scholarly articles convey valuable information not only through unstructured 
text but also via (semi-)structured figures such as charts and diagrams. 
Automatically interpreting the semantics of knowledge encoded in these figures 
can be beneficial for downstream tasks such as question answering (QA). In the 
SciVQA challenge, the participants will develop multimodal systems capable of 
efficiently processing both visual (i.e., addressing attributes such as colour, 
shape, size, etc.) and non-visual QA pairs based on images of scientific 
figures and their captions. 
Organizers: Ekaterina Borisova, Georg Rehm
 
Scientific Fact-checking of Social Media Posts on Climate Change (ClimateCheck)
The ClimateCheck shared task focuses on fact-checking claims from social media 
about climate change against peer-reviewed scholarly articles. Participants 
will retrieve relevant publications from a corpus of 400,000 climate research 
articles and classify each abstract as supporting, refuting, or not having 
enough information about the claim. Training data will include human-annotated 
claim-publication pairs, and the evaluation will combine nDCG@K and Bpref for 
retrieval and F1 score for classification. The task aims to develop models that 
link social media claims to scientific evidence, promoting informed and 
evidence-based discussions on climate change.
Organizers: Raia Abu Ahmad, Georg Rehm
 
Software Mention Detection in Scholarly Publications (SOMD 25)
Software plays an essential role in computational research methods and is 
considered one of the crucial entities in scholarly documents. However, 
software mentions are not always cited in academic documents, resulting in 
various informal mentions of software across a paper. Automatic identification 
of such software mention contributes to the better understanding, 
accessibility, and reproducibility of the research work. In addition to the 
mention of software, to understand the research context, it is necessary to 
understand the purpose of a software mention and its attributes, making 
software mention detection a comprehensive task. 
We are extending our first iteration of the shared task SOMD 2024 
<https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html> with new 
challenges. In addition to information extraction techniques, our extended 
focus would be on Joint Named Entity and Relation Classification techniques.
Organizers: Sharmila Upadhyaya, Frank Krueger, Stefan Dietze
 
SciHal2025: Hallucination Detection for Scientific Content
Generative AI-enhanced academic research assistants are transforming how 
research is conducted. By allowing users to pose research-related questions in 
natural language, these systems can generate structured and concise summaries 
supported by relevant references. However, hallucinations — unsupported claims 
introduced by large language models — remain a significant obstacle to fully 
trusting these automatically generated scientific answers.
 
 
Organizing Committee
Tirthankar Ghosal, Oak Ridge National Laboratory, USA
Philipp Mayr, GESIS – Leibniz Institute for the Social Sciences, Germany
Aakanksha Naik, Allen Institute for AI, USA
Amanpreet Singh, Allen Institute for AI, USA
Anita de Waard, Elsevier, Netherlands
Dayne Freitag, SRI International, USA
Georg Rehm, German Research Center for Artificial Intelligence (DFKI), Germany
Sonja Schimmler, Fraunhofer FOKUS, Germany
Dan Li, Elsevier, Netherlands
 


_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to