BioCreative IX Challenge and Workshop CFP
Large Language Models for Clinical and Biomedical NLP at IJCAI
Where, When:
The BioCreative IX 
workshop<https://www.ncbi.nlm.nih.gov/research/bionlp/biocreative9> will run 
with IJCAI 2025<https://2025.ijcai.org/>, August 16-22, 2025, In Montreal, CA.
BioCreative IX:
The 9th BioCreative workshop seeks to attract researchers interested in 
developing and evaluating automatic methods of extracting medically relevant 
information from clinical data and aims to bring together the medical NLP 
community and the healthcare researchers and practitioners. The challenge 
tracks explore MedHopQA, a dataset for benchmarking LLM-based reasoning systems 
with disease-centered question answers, ToxHabits, a task exploring the 
information extraction related to substance use and abuse in Spanish clinical 
content, and Sentence segmentation of real clinical notes using MIMIC-II 
clinical notes. We also will feature paper submissions on relevant topics and 
poster/tool demonstrations.
Important Dates
March - April: Team Registration
May 12, 2025: Testing predictions, Evaluation results
May 19, 2025: Submission of participants papers deadline
Jun 06, 2025:  Notification of accepted papers deadline
Aug 16- Aug 22 2025: IJCAI 2025

Workshop Proceedings and Special Issue:
The BioCreative IX Proceedings will host all the submissions from participating 
teams, and they will be freely available by the time of the workshop.
In addition, select papers will be invited for a journal BioCreative IX special 
issue for work that passes their peer-review process. More details and 
information to submit will be posted in June.

Participation:
Teams can participate in one or more of these tracks. Team registration will 
continue until April 30th, when final commitment is requested.
To register a team go to the Registration 
Form<https://forms.gle/xbQp158cn5pgJ1oj9>. If you have restrictions accessing 
Google forms please send e-mail to [email protected].

Call for Papers
We welcome submissions on work that describes research on similar topics to the 
three challenges, as well as:

  *   Development of benchmarking datasets for clinical NLP
  *   Creating and evaluating synthetic data using LLMs and its impact for 
downstream tasks
  *   Creative use of data augmentation for increasing tool accuracy and 
trustworthiness
  *   Use of LLMs to streamline annotation tasks
  *   NLP-systems capable of identifying entities in multilingual corpora
  *   NLP-systems capable of semantic interoperability across different 
terminologies/ ontologies for efficient data curation
  *   Integrating ontologies and knowledge bases for factual LLM production
  *   Annotated corpora and other resources for health care and biomedical data 
modelling
All submissions will be considered for poster presentations and tool 
demonstrations at the workshop.

BioCreative IX Tracks:
Track 1: MedHopQA
Large language models (LLMs) are commonly evaluated on their capabilities to 
answer questions in various domains, and it has become clear that robust QA 
datasets are critical to ensure proper evaluation of LLMs prior to their 
deployment in real-world biomedical or healthcare related applications. This 
track aims to advance the development of LLM-based systems that are capable of 
answering questions that involve multi-step reasoning. We have created a 
resource consisting of 1,000 question-answer pairs - focusing on diseases, 
genes and chemicals, mostly pertaining to rare diseases - based on public 
information in Wikipedia. The participants are encouraged to use any training 
data they wish to design and develop their NLP system agents that understand 
asserted information on genes, diseases, chemicals etc. and are able to answer 
multi-step reasoning questions involving such information. This track builds on 
the previous success in biomedical QA benchmarking (e.g., PubMedQA and BioASQ, 
MedQA) but differs from them in the fact that for MedHopQA it is necessary to 
employ a multi-step reasoning process to find the correct answer.
Track 3: ToxHabits
There is a pressing need to extract information related to substance use and 
abuse more systematically, including not only smoking and alcohol abuse but 
also other harmful drugs and substances from clinical content. These toxic 
habits have a considerable health impact on a variety of medical conditions and 
also affect the action of prescribed medications. To make such information 
actionable, it is critical to not only detect instances of consumption, but 
also to characterize certain aspects related to it, such as duration or mode of 
administration. Some initial efforts have been made to automatically detect 
social determinants of health, including smoking status, for content in 
English, but very limited efforts have been made for content in other 
languages. Therefore, we propose the ToxHabits track to address the automatic 
extraction of substance use and abuse information from clinical cases in 
Spanish. This task will consist of three subtasks: (a) toxic habit mention 
recognition, (b) detection of relevant clinical modifiers related to substance 
abuse, as well as (c) toxic habit condition QA challenge.
Track 2: Sentence segmentation of real-life clinical notes
Sentence segmentation is a fundamental linguistic task and is widely used as a 
pre-processing step in many NLP tasks. Although the development of LLMs and the 
sparse attention mechanism in transformer networks have reduced the necessity 
of sentence level inputs in some NLP tasks, many models are designed and tested 
only for shorter sequences. The need for sentence segmentation is particularly 
pronounced in clinical notes, as most clinical NLP tasks depend on this 
information for annotation and model training. In this shared task, we 
challenge participants to detect sentence boundaries (spans) for MIMIC-III 
clinical notes, where fragmented and incomplete sentences, complex graphemic 
devices (e.g. abbreviations, and acronyms), and markups are common. To 
encourage generalizability to multi-domain texts, participants will receive 
annotated texts from newswire articles and biomedical literature, in addition 
to clinical notes, for model development and evaluation.
Organizing Committee

  *   Dr. Rezarta Islamaj, National Library of Medicine
  *   Dr. Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center
  *   Dr. Martin Krallinger, Barcelona Supercomputing Center
  *   Dr. Zhiyong Lu, National Library of Medicine

----------------------------------------------------------
Rezarta Islamaj

National Library of Medicine
[email protected]<mailto:[email protected]>


_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to