Utrecht University, The Netherlands

In NLP, there is a growing recognition that data quality is key to better 
language models, yet we still know very little about the link between data and 
model behavior. In this project, we will develop methods to measure the 
diversity of NLP datasets, assess the impact of diversity on NLP models, and 
improve data collection and model training.

As a PhD student, you will develop innovative methods to measure the diversity 
of NLP datasets. A major focus will be on measuring the dataset diversity from 
a sociolinguistic perspective, considering language variation – such as styles 
and dialects - and combining (socio)linguistic insights with neural language 
modeling. You will also draw from relevant disciplines, particularly the social 
sciences, that have developed measurement approaches for diversity. 
Furthermore, you will carry out experiments to assess the impact of data 
diversity on NLP models, with a focus on fairness and robustness, and 
investigate ways to leverage data diversity to improve NLP models.

You will join the NLP & Society Lab, headed by Dong Nguyen, where we work on a 
variety of topics, including computational sociolinguistics, analysis of online 
conversations, data-centered NLP, and evaluation of NLP models. We are part of 
the wider NLP group within the Department of Information and Computing Sciences 
of Utrecht University (UU), the Netherlands.

For more details and to apply, visit the link below:
https://www.uu.nl/en/organisation/working-at-utrecht-university/jobs/phd-position-on-data-diversity-for-fair-and-robust-nlp-datadivers-project
  (Deadline: Jan 5)

Contact: Dong Nguyen ([email protected])
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to