Conversational agents offer promising opportunities for education as they can 
fulfill various roles (e.g., intelligent tutors and service-oriented 
assistants) and pursue different objectives (e.g., improving student skills and 
increasing instructional efficiency), among which serving as an AI tutor is one 
of the most prevalent tasks. Recent advances in the development of Large 
Language Models (LLMs) provide our field with promising ways of building 
AI-based conversational tutors, which can generate human-sounding dialogues on 
the fly. The key question posed in previous research, however, remains: *How 
can we test whether state-of-the-art generative models are good AI teachers, 
capable of replying to a student in an educational dialogue?*

In this shared task, we will focus on educational dialogues between a student 
and a tutor in the mathematical domain grounded in student mistakes or 
confusion, where the AI tutor aims to remediate such mistakes or confusions, 
with the goal of evaluating the quality of tutor responses along the key 
dimensions of tutor’s ability to (1) identify student’s mistake, (2) point to 
its location, (3) provide the student with relevant pedagogical guidance, that 
is also (4) actionable. Dialogues used in this shared task include the dialogue 
contexts from MathDial (Macina et al., 2023) and Bridge (Wang et al., 2024) 
datasets, including the last utterance from the student containing a mistake, 
and a set of responses to the last student’s utterance from a range of 
LLM-based tutors and, where available, human tutors, aimed at mistake 
remediation and annotated for their quality.

**Tracks**

This shared task will include five tracks. Participating teams are welcome to 
take part in any number of tracks.
- Track 1 - Mistake Identification: Participants are invited to develop systems 
to detect whether tutors' responses recognize mistakes in students' solutions.
- Track 2 - Mistake Location: Participants are invited to develop systems to 
assess whether tutors' responses accurately point to genuine mistakes and their 
locations in the students' responses.
- Track 3 - Pedagogical Guidance: Participants are invited to develop systems 
to evaluate whether tutors' responses offer correct and relevant guidance, such 
as an explanation, elaboration, hint, examples, and so on.
- Track 4 - Actionability: Participants are invited to develop systems to 
assess whether tutors' feedback is actionable, i.e., it makes it clear what the 
student should do next.
- Track 5 - Guess the tutor identity: Participants are invited to develop 
systems to identify which tutors the anonymized responses in the test set 
originated from.


**Participant registration**

All participants should register using the following link: 
https://forms.gle/fKJcdvL2kCrPcu8X6


**Important dates**

All deadlines are 11:59pm UTC-12 (anywhere on Earth).

- March 12, 2025: Development data release
- April 9, 2025: Test data release
- April 23, 2025: System submissions from teams due
- April 30, 2025: Evaluation of the results by the organizers
- May 21, 2025: System papers due
- May 28, 2025: Paper reviews returned
- June 9, 2025: Final camera-ready submissions
- July 31 and August 1, 2025: BEA 2025 workshop at ACL


**Shared task website**: https://sig-edu.org/sharedtask/2025

**Organizers**
- Ekaterina Kochmar (MBZUAI)
- Kaushal Kumar Maurya (MBZUAI)
- Kseniia Petukhova (MBZUAI)
- KV Aditya Srivatsa (MBZUAI)
- Justin Vasselli (Nara Institute of Science and Technology)
- Anaïs Tack (KU Leuven)

**Contact**: [email protected]<mailto:[email protected]>

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to