(Apologies for cross-posting)

Conversational agents offer promising opportunities for education as they can 
fulfill various roles (e.g., intelligent tutors and service-oriented 
assistants) and pursue different objectives (e.g., improving student skills and 
increasing instructional efficiency), among which serving as an AI tutor is one 
of the most prevalent tasks. Recent advances in the development of Large 
Language Models (LLMs) provide our field with promising ways of building 
AI-based conversational tutors, which can generate human-sounding dialogues on 
the fly. The key question posed in previous research, however, remains: How can 
we test whether state-of-the-art generative models are good AI teachers, 
capable of replying to a student in an educational dialogue?

In this shared task, we focus on educational dialogues between a student and a 
tutor in the mathematical domain grounded in student mistakes or confusion, 
where the AI tutor aims to remediate such mistakes or confusions, with the goal 
of evaluating the quality of tutor responses along the key dimensions of 
tutor’s ability to (1) identify student’s mistake, (2) point to its location, 
(3) provide the student with relevant pedagogical guidance, that is also (4) 
actionable. Dialogues used in this shared task include the dialogue contexts 
from MathDial (Macina et al., 2023) and Bridge (Wang et al., 2024) datasets, 
including the last utterance from the student containing a mistake, and a set 
of responses to the last student’s utterance from a range of LLM-based tutors 
and, where available, human tutors, aimed at mistake remediation and annotated 
for their quality.

Data Release
We are pleased to announce that the test data is now released and can be 
accessed at 
https://github.com/kaushal0494/UnifyingAITutorEvaluation/blob/main/BEA_Shared_Task_2025_Datasets/mrbench_v3_testset.json.

Test Platform
The competition is hosted on the CodaBench 
(https://www.codabench.org<https://www.codabench.org/>) platform, with a 
separate page for each track.

Track 1 – Mistake Identification: https://www.codabench.org/competitions/7195/
Track 2 – Mistake Location: https://www.codabench.org/competitions/7200/
Track 3 – Providing Guidance: https://www.codabench.org/competitions/7202/
Track 4 – Actionability: https://www.codabench.org/competitions/7203/
Track 5 – Tutor Identification: https://www.codabench.org/competitions/7206/

Registered teams are welcome to participate in any number of tracks.

Participation
In order to participate in the test phase, you will need to create an account 
on CodaBench<https://www.codabench.org/>, if you don't already have one. After 
that, please register for the specific track(s) you wish to submit your 
systems' predictions to. By participating in this shared task, you are agreeing 
to the Terms outlined on the shared task track webpages (see tab "Terms").

The total number of submissions per each team is capped at 5 for each track 
(with the maximum of 2 submissions per day). The platform will ask you to 
provide your team name and a title for each submission – the latter may be 
useful to distinguish between your different submissions. All submissions will 
then be reflected on the CodaBench platform together with all the accompanying 
information (team name, affiliation, and submission name). Please note that we 
will publish the official final leaderboard on the shared task website 
(https://sig-edu.org/sharedtask/2025), where only the first 5 submissions per 
team will be included to adhere with the terms of this shared task.

To be added to the shared task mailing list for further updates, please 
register here: https://forms.gle/fKJcdvL2kCrPcu8X6

Important dates

All deadlines are 11:59pm UTC-12 (anywhere on Earth).

- March 12, 2025: Development data release
- April 10, 2025: Test data release
- April 24, 2025: System submissions from teams due
- April 30, 2025: Evaluation of the results by the organizers
- May 21, 2025: System papers due
- May 28, 2025: Paper reviews returned
- June 9, 2025: Final camera-ready submissions
- July 31 and August 1, 2025: BEA 2025 workshop at ACL

Contact: [email protected]<mailto:[email protected]>
Shared task website: https://sig-edu.org/sharedtask/2025



_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to