[Corpora-List] PhD in NLP - PATRIMALP Materials, Pigments, Lights: the colors of Heritage – Natural Language Processing for cultural heritage

François Portet Mon, 22 Aug 2022 01:21:39 -0700

PhD in NLP - PATRIMALP Materials, Pigments, Lights: the colors ofHeritage – Natural Language Processing for cultural heritage
Starting date: October 01, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 12th, 2022
Salary: 1 975 € gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
Keywords: natural language processing, knowledge representation,cultural heritage, transfer learning, multilingualism
CONTEXT
The main challenge of the Patrimalp project is the development of anintegrated and interdisciplinary Heritage Science, in order to ensurecultural Heritage sustainability, promotion and dissemination incontemporary society. The ambition is to produce the forms ofintelligibility of a global and moving process which starts from thecollection of the raw material, its transformation into a primitiveobject, different lives as a material (alterations, degradations,transformations ...) and as a symbol (relegation, disinterest,oblivion or rebirth, exaltation...) throughout history, and finallyfrom its election as an object of historical and Heritage value andits “promotion” into a work of art. This research is applied tounderstand how inks and pigments have been conceived over severalcenturies, how it has been used in art work as well as how thehandcrafting method has evolved and been disseminated over centuriesand countries.
To make this study possible, the project will gather a largecollection of textual material made up of alchemical works andcollections of natural or artificial objects collected between the16th and 18th centuries. To better understand the choice of colors forthese "wonders", we want to reconstruct the recipes for making coloredmaterial in its context of thought, whether technical or symbolic.These recipes will constitute a new body of research for literarypeople and a new data-study case for building knowledge about color.This corpus indeed offers modes of representation inscribed in complexforms of writing and fiction whose modalities and frames of referenceremain to be analyzed (accounts of technical, medical orphysico-chemical experiments inscribed in fictional worlds ormythological, symbolic descriptions of artifacts, or materialscollected in nature, mines). On the linguistic level, the inventory ofthis lexicon in different European and Eastern languages will lead tothe formalization of the knowledge of these various skills over timeand several cultures. This corpus will thus provide complex data onthe material and symbolic origin of the ingredients of color, on itsuse, its names and its physical or symbolic perception: these datarepresent a challenge for computer researchers who will have toorganize them in order to benefit curators, chemists or physicists, inontologies representing the state of knowledge from the point of viewof scholars over the ages.To systematically explore the corpus of these recipes, we will use NLPtechniques to uncover the correlations between recipes, physical andchemical composition of objects and symbolic references. The finalobjective is to build a knowledge base (objects, components ofobjects, materials, colors, know-how, reference framework) each of theparts being able to reference a specific ontology (ontology ofpigments, materials, colors...) to make it possible for researchers toobserve the trajectory from the writing of color to its technical andartisan practice from this specific corpus.
PHD OBJECTIVES
The PhD project will focus on segmenting, extracting and representingrecipes from a corpus of alchemical works from the 16th and 18thcenturies to make them accessible to researchers in the humanities.This necessitates to :
·    identify which excerpts of the text belong to a recipe;
· supervise an annotation campaign to build an analysis andtraining corpus· build NLP tools to extract automatically the list of elements(raw material, tools, quantity, units) and actions (verb, adverb,adjective) that made up the recipes;
·    analyze the dependencies between the elements of a recipe rules ;
·    Represent these rules in a formal knowledge representation.

The results of this processing will support :
· The documentation of this unique set of text, by inserting theextracted elements to the document meta data to easy retrieval
·    The building a knowledge base of alchemical recipes
This PhD will need to address several challenges. One of them is to beable to process text composed of multiple non-modern languages(French, German, English, Latin, Greek) [Coavoux2022,Grobol2022] . Oneapproach we will be to study how large multilingual pre-trained models[Delvin2019, Xue2020] can be leveraged and adapted for the task andhow disparate collection of corpora of ancient texts can be used tofine-tune them. Another challenge will be the paucity of data for thedownstream tasks (segmentation, parsing, Natural LanguageUnderstanding [Desot2022]) for this we will need to identify otherrelated corpus (e.g. cooking) to address the problem in a multitasksetting (such as NER and NLU) and transfer learning.
SKILLS
· Master 2 in Natural Language Processing, computer science or datascience.
·    Good mastering of  Python programming and  deep learning frameworks.
· Previous experience in text classification, parsing, processingof several languages or text retrieval would be a plus
·    Very good communication skills in English and good command of French


SCIENTIFIC ENVIRONMENT
The thesis will be conducted within the STEAMER and GETALP teams ofthe LIG laboratory(http://steamer.imag.fr/ and https://lig-getalp.imag.fr/).The GETALPteam has strong expertise and track record in Natural LanguageProcessing, STEAMER team has strong expertise in Knowkledgerepresentation and reasoning.The recruited person will be welcomedwithin the teams which offer a stimulating, multinational and pleasantworking environment. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms ofequipment (personal computer, access to the LIG GPU servers).The PhD candidate will collaborate with the partners involved in thePATRIMALP project, in particular with Laurence Rivière from the LUHCIElab (Laboratoire Universitaire Histoire Cultures Italie Europe) andVéronique Adam from the LITT&ARTS lab (Littératures et Arts).
INSTRUCTIONS FOR APPLYING
Applications must contain: CV + letter/message of motivation + masternotes + be ready to provide letter(s) of recommendation; and beaddressed to Danielle Ziebelin([email protected]), François Portet([email protected]) Maximin Coavoux([email protected])
REFERENCES
[Coavoux2022] Maximin Coavoux, Corinne Denoyelle, Olivier Kraif,Julie Sorba. Phraséologie du roman médiéval en prose. Diachro X – lefrançais en diachronie, Sorbonne Université, May 2022, Paris, France[Delvin2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KristinaToutanova. 2019. BERT: Pre-training of deep bidirectional transformersfor language understanding. In Proceedings of NAACL.[Desot 2022] Desot, T., Portet, F., & Vacher, M. (2022). End-to-EndSpoken Language Understanding: Performance analyses of a voice commandtask in a low resource setting. Computer Speech & Language, 75, 101369.[Grobol2022] Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez,Benoît Sagot, Laurent Romary and Benoit Crabbé BERTrade: UsingContextual Embeddings to Parse Old French. 13th InternationalConference on Language Resources and Evaluation (LREC 2022), May 2022,Marseille, France[Xue2020] Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R.,Siddhant, A., ... & Raffel, C. (2020). mT5: A massively multilingualpre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.

--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE

Phone:  +33 (0)4 57 42 15 44
Email:  [email protected]
www:    http://membres-liglab.imag.fr/portet/

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] PhD in NLP - PATRIMALP Materials, Pigments, Lights: the colors of Heritage – Natural Language Processing for cultural heritage

Reply via email to