*CALAMITA - Challenge the Abilities of LAnguage Models in ITAlian*

*Special event co-located with the Tenth Italian Conference on
Computational Linguistics - CLiC-it 2024 Pisa, 4 - 6 December, 2024 -
https://clic2024.ilc.cnr.it/ <https://clic2024.ilc.cnr.it/> *


*Upcoming deadline: 17th May 2024, challenge pre-proposal submission!
Pre-proposal form: *https://forms.gle/u4rSt9yXHHYquKrB6

*Project Description*

AILC, the Italian Association for Computational Linguistics, is launching a
*collaborative* effort to develop a dynamic and growing benchmark for
evaluating LLMs’ capabilities in Italian.

In the *long term*, we aim to establish a suite of tasks in the form of a
benchmark which can be accessed through a shared platform and a live
leaderboard. This would allow for ongoing evaluation of existing and newly
developed Italian or multilingual LLMs.

In the *short term*, we are looking to start building this benchmark
through a series of challenges collaboratively construed by the research
community. Concretely, this happens through the present call for challenge
contributions. In a similar style to standard Natural Language Processing
shared tasks, *participants are asked to contribute a task and the
corresponding dataset with which a set of LLMs should be challenged*.
Participants are expected to provide an explanation and motivation for a
given task, a dataset that reflects that task together with any information
relevant to the dataset (provenance, annotation, distribution of labels or
phenomena, etc.) and a rationale for putting that together that way.
Evaluation metrics and example prompts should also be provided. Existing
relevant datasets are also very welcome, together with related publications
if available. All of the proposed challenges either with existing datasets
or new datasets, will have to follow the challenge template, which will be
distributed in due time, towards the write-up of a challenge paper.

In this first phase, all prospective participants are asked to submit a
*pre-proposal* by filling in this form https://forms.gle/u4rSt9yXHHYquKrB6.
Please fill in all the fields so we can get an idea of what challenge you’d
like to propose, how the model should be prompted to perform the task,
where you’d get the data and how much, whether it’s already available, etc.

The organizers will examine the submitted pre-proposals and select those
challenges that comply with the template’s requirements, with an eye to
balancing different challenge types. The selected challenges will be
expanded with a full dataset, longer descriptions, etc. according to the
aforementioned template which will be distributed later. The final report
of each accepted challenge must provide the code for the evaluation with an
example that must smoothly run on a pre-selected base LLM (most likely
LLaMa-2) which will be communicated by the organisers in the second phase.
All reports will be published as CEUR Proceedings related to the CALAMITA
event. Subsequently, all challenge organisers who wish to be involved can
participate in a broader follow-up paper, targeting a top venue, which will
describe the whole benchmark, procedures, results, and analyses.

Once this first challenge set is put together, the *CALAMITA organizers*
will run *zero* or *few* shots experiments with a selection of LLMs, and
write a final report. No tuning materials or experiments are expected at
this stage of the project.

*Deadlines (tentative)*

   - *17th May 2024: pre-proposal submission*
   - 27th May 2024: notification of pre-proposal acceptance
   - End of May 2024: distribution of challenge paper template and further
   instructions
   - 2nd September 2024: data and report submission
   - 30th September 2024: benchmark ready with reports for each challenge
   (after light review)
   - October-November 2024: running selected models on the benchmark with
   analyses
   - 4th-6th December 2024: CLIC-it Pisa (special event co-located with
   CLIC-it 2024)

*Website:* https://clic2024.ilc.cnr.it/calamita (under construction)

*Mail: *[email protected]

*Organizers*

   - Pierpaolo Basile (University of Bari Aldo Moro)
   - Danilo Croce (University of Rome, Tor Vergata)
   - Malvina Nissim (University of Groningen)
   - Viviana Patti (University of Turin)
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to