The Computational Linguistics group (GroNLP) of the Center for Language and 
Cognition Groningen (CLCG) is looking for a PhD student in “Language technology 
for cultural heritage: New discoveries with little data” within the HAICu 
research project. The HAICu project is a large-scale Dutch research project by 
universities and cultural-heritage institutions into new forms of Artificial 
Intelligence-based access to multimodal Cultural-Heritage data, both 
contemporary and historical. Within HAICu, AI researchers, Digital Humanities 
researchers and a wide range of public and private partners will co-develop 
scientific solutions to unlock the true societal potential of the current 
heterogeneous digital heritage collections. It will provide easier, richer and 
more reliable data access to citizens, journalists, civic organisations, and 
various other stakeholders.

HAICu is funded by the NWO National Science Agenda (NWA) and has a budget of 
about EUR 10 million. HAICu has started in January 2024 and will last 6 years 
(until Jan 2030). For more information about HAICu, please see 
https://www.haicu.science/

The PhD Project
This specific PhD position is about effectively dealing with missing and sparse 
labels in humanities datasets such as literature, history, philosophy. Cultural 
heritage institutions, and especially the National Library of the Netherlands, 
offer access to a lot of digitized data which can be leveraged through 
computational approaches. However, it is very common that the data is 
incomplete. This is a challenge for typical machine learning methods that rely 
on being fed with representative and complete data, leading to systems that 
cannot handle distribution shifts or extrapolating beyond their training set.

Recent developments in artificial intelligence have shown that large language 
models are able to learn from small amounts of training data, or even none at 
all (few shot and zero shot learning). Paired with more and more accessible 
techniques for specializing existing models for target domains and tasks, a lot 
of new possibilities open up for cultural heritage data, which will be explored 
within this project. Examples of possible topics include

- Investigating literary reception and prestige over time.
- Detecting and mapping intertextuality within texts.
- Uncovering the influences and biases over time in datasets.
- Monitoring the evolution of concepts in textual datasets.
- Improving the robustness of models to out-of-distribution data.

The project will, in collaboration with the National Library of The 
Netherlands, be coordinated by Andreas van Cranenburgh, Tommaso Caselli, and 
Malvina Nissim at the University of Groningen. This is an interdisciplinary 
project at the intersection of Computational Linguistics/Natural Language 
Processing (NLP) and the humanities.

You will be asked to

- Develop a specific research proposal within the proposed theme.
- Review the academic literature relevant to the project’s goals.
- Carry out research, present your results and author scientific articles on 
the above mentioned topics.
- Collaborate with members of the Computational Linguistics group at the 
University of Groningen, the National Library, and with the broader Haicu 
consortium.
- Engage and collaborate with other researchers working on computational 
humanities research.
- Complete a PhD thesis written in English in the specified timeframe (4 years).
- Collaborate on outreach and public engagement activities.
- Gain teaching experience.

This PhD project offers a unique opportunity to work in an international 
environment and to acquire valuable research experience: You will be carrying 
out research in the context of the Computational Linguistics group of the 
Center for Language and Cognition (CLCG) of the University of Groningen, and 
will be spending at least one day a month at the National Library in The Hague.

For more information, see 
https://www.rug.nl/about-ug/work-with-us/job-opportunities/?details=00347-02S000AYDP
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to