🚀 Second Call for Interest: DISRPT 2025 Shared Task on Discourse Relation 
Parsing and Treebanking. 
🛎️ sample data has been released! 
In conjunction with CODI-CRAC & EMNLP 2025 - Suzhou, China, Nov. 5-9.
This year, we are organizing the fourth edition of the DISRPT shared task on 
discourse processing across formalisms, for a variety of languages and genres, 
with three subtasks:
 
* Task 1: Discourse segmentation
* Task 2: Connective identification
* Task 3: Relation classification
 
We will provide training, development and test datasets from (almost) all 
available languages  in RST / eRST, SDRT, PDTB, ISO 24617, and discourse 
dependencies, using a uniform format. Because different corpora, languages, and 
frameworks use different guidelines, the shared task will promote the design of 
flexible methods for dealing with various guidelines, and will help to push 
forward the discussion of converging standards for discourse units. For 
datasets which have treebanks, we will evaluate segmentation in two different 
scenarios: with and without gold syntax. An automatically parsed version is 
provided for all corpora without a gold parse. 
 
This year, the shared task will feature: 
 * The inclusion of more frameworks, with datasets from: RST / eRST, SDRT, 
PDTB, ISO 24617, and discourse dependencies * The inclusion of new corpora and 
new languages, some of them kept a surprise! * A unified set of labels for the 
discourse relations, to make easier the evaluation across datasets * A new 
constraint: only one multilingual model should be submitted per task, and it 
should be small! This will make our replication work easier, but more 
importantly, it will simplify using such a model and test the robustness of 
your solution. 
Today, we’re excited to announce the release of the sample data for the DISRPT 
2025 Shared Task! You can now access the data, format documentation, and tools 
on our GitHub 🔗 https://github.com/disrpt/sharedtask2025
The sample covers five discourse frameworks — RST / eRST, PDTB, SDRT, and 
Discourse Dependencies — across 12 languages: English, Basque, French, Dutch, 
Italian, Portuguese, Spanish, Frasi, Chinese, Russian, Turkish, and Thai.
We invite researchers and teams interested in participating to register now. 
Registered participants will be added to our mailing list and receive all 
future updates.
📅 The full training data will be released on June 16, 2025 — stay tuned!
To join the mailing list and stay informed, please email us at:
📧 [email protected] 
Let us know you're interested — we’d love to have you on board!
**Important dates**
 
 * May 16 2025 – Sample data release [NOW] * June 16 2025 – Training data 
release * July 14 2025 – Test data release * August 1 2025 – System + paper 
submissions due * September 12 2025 – Notification of acceptance * September 19 
2025 – Camera ready papers * November 8-9 2025 – CODI at EMNLP
All deadlines are 11.59 pm UTC -12h (AoE, "Anywhere on Earth").
 
**Information:**
 
Contact the organizers: [email protected] 
Official website: https://sites.google.com/view/disrpt2025/
​​​​​Google group for participants, please join us on: 
[email protected]
 
 
**Organization:**
 
Peter Bourgonje (Universität Potsdam, Germany)
Chloé Braud (CNRS - IRIT, University of Toulouse, France)
Chuyuan Li (University of British Columbia, Canada)
Janet Yang Liu (LMU Munich, Germany)
Philippe Muller (CNRS - University of Toulouse, France)
Amir Zeldes (Georgetown University, Washington DC, USA)
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to