Dear Friends and Colleagues, Whether you are a champ or skeptic of LLMs, they are here to stay. The ending of the movie is still unknown. We believe a critical aspect in the plot development will be the progress we make in the evaluation process of LLMs both in the training and testing ( inference ) phases. We lack appropriate, shared, and transparent evaluation methodologies and metrics.
We would like to invite you to review and contribute to the proposed open model for LLM evaluations. It is a framework you can use as-is, in part, or contribute to modifying and extending. Article: Evaluation of Response Generation Models: Shouldn’t It Be Shareable and Replicable? <https://aclanthology.org/2022.gem-1.12/>. In Proceedings of the 2nd GEM Workshop @EMNLP 2022. Repo (codes, UI, guidelines, etc.): https://github.com/sislab-unitn/Human-Evaluation-Protocol Publications utilizing the proposed protocol: 1. Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps? <https://aclanthology.org/2023.nlp4convai-1.1/> (Mousavi et al., NLP4ConvAI 2023) 2. Are LLMs Robust for Spoken Dialogues? <https://arxiv.org/abs/2401.02297> (Mousavi et al., IWSDS2024) Best Regards ---- Prof. Dr.-Ing. Giuseppe Riccardi Founder and Director of the Signals and Interactive Systems Lab Department of Computer Science and Engineering Department University of Trento
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
