Dear all,

Please find in the following the CoP for the workshop on Diversity in Large 
Speech and Language Models 

Date: 20 February 2025
Place: Humboldt-Universität Berlin, Dorotheenstraße 24, Berlin, Germany

Machine learning techniques have conquered many different tasks in speech and 
natural language processing, such as speech recognition, information 
extraction, text and speech generation, and human machine interaction using 
natural language or speech (chatbots). Modern techniques typically rely on 
large models for representing general knowledge of one or several languages 
(Large Language Models, LLMs), or for representing speech and general audio 
characteristics. These models have been trained with large amounts of speech 
and language data, typically including web content. When humans interact with 
such technologies, the effectiveness of the interaction will be influenced by 
how far humans make use of the same type of language the models have been 
trained on or, in other words, if the models are able to generalize to the 
language used by humans when interacting with the technology. This may lead to 
some gradual forms of adaptation in human speech and language production, and 
users who do not adapt may be excluded from efficient use of such technologies. 
On top of this, as commercial model development follows market needs, 
under-represented languages and dialects/sociolects may decrease in terms of 
priorities. Furthermore, for many lesser spoken languages the necessary data is 
not available, which will worsen a digital divide in speech and language 
technology usage.

The workshop sets out to discuss this problem based on scientific contributions 
from the perspective of computer science and linguistics (including 
computational linguistics and NLP).
Topics which we aim to address include but are not limited to:

User diversity: Which aspects of human speech and language production affect 
the performance of large foundation models? In which way, and for which tasks?
Language use: How are large language models able to cope with different 
languages, dialects, and sociolects? How do they deal with code switching?
Human adaptation: How does the use of large language models affect language 
comprehension, as well as speech and language production? Which alignment 
effects occur, and in which time spans?
Model adaptation: How do models need to be designed to better cope with speech 
and language diversity? How do training and finetuning affect model performance?
Inclusion: What data and technologies are necessary to better cope with 
diversity in large speech and language models?
The workshop will consist of a number of oral presentations and discussion 
panels. Accepted speakers are invited to submit a short or long paper which 
will be published online after the workshop.

Details and registration: 
https://www.tu.berlin/en/qu/about-us/news/isca-itg-workshop

Best,
Stefan Hillmann

--
Dr.-Ing. Stefan Hillmann
Wissenschaftlicher Mitarbeiter / Senior Researcher
er, ihm / he, his
Anrede / Form of address: Herr / Mr.

Technische Universität Berlin
Fakultät IV / Faculty 4
Elektrotechnik und Informatik / Electrical Engineering and Computer Science
Quality and Usability Lab
Sekr. MAR 6-7, Marchstr. 23, 10587 Berlin, GERMANY

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to