Dear Friends and Colleagues,

Whether you are a champ or skeptic of LLMs, they are here to stay. 
The ending of the movie is still unknown.
We believe a critical aspect in the plot development
will be the progress we make in the evaluation process of LLMs 
both in the training and testing ( inference ) phases. 
We lack appropriate,  shared, and transparent 
evaluation methodologies and metrics. 

We would like to invite you to review and contribute to  the proposed open 
model for 
LLM evaluations. It is a framework you can use as-is, in part, or contribute to 
modifying
and extending. 

Article: Evaluation of Response Generation Models: Shouldn’t It Be Shareable 
and Replicable? <https://aclanthology.org/2022.gem-1.12/>. 
In Proceedings of the 2nd GEM Workshop @EMNLP 2022.
Repo (codes, UI, guidelines, etc.): 
https://github.com/sislab-unitn/Human-Evaluation-Protocol 

Publications utilizing the proposed protocol:
1. Response Generation in Longitudinal Dialogues: Which Knowledge 
Representation Helps? <https://aclanthology.org/2023.nlp4convai-1.1/> (Mousavi 
et al., NLP4ConvAI 2023)
2. Are LLMs Robust for Spoken Dialogues? <https://arxiv.org/abs/2401.02297> 
(Mousavi et al., IWSDS2024)

Best Regards

----
Prof. Dr.-Ing. Giuseppe Riccardi
Founder and Director of the Signals and Interactive Systems Lab
Department of Computer Science and Engineering Department
University of Trento 



_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to