Postdoc position at IRIT, Toulouse (France) – ANR AnDiAMO
Developing systems towards robust discourse parsing and its application
https://pagesperso.irit.fr/~Chloe.Braud/andiamo/ 
* Contract duration: 12 months (flexible)
* Starting date: December 2022 (flexible)
* Location: IRIT, Université P. Sabatier (Toulouse III)
* Remuneration: starting at 2,745 euros, gross salary, depending on experience
* Application deadline: the position will be open until fulfilled
* Send application by email to [email protected]
Application procedure: please send a CV and a short letter motivating your 
application by detailing the following elements:
* indicate your **skills in machine learning**, e.g. the type of tasks you 
already worked on, the type of algorithms, the libraries used. Please specify 
your experience with neural architectures and pre-trained language models.
* describe your **interest and/or experience in natural language processing**, 
i.e. the type of tasks you already tried to solve if any, or similar problems 
you worked on, or why you now want to work in NLP and why you think your 
experience in another domain could be relevant 
* If you are interested but don’t have a phd, rather a master / engineer 
diploma and your CV fits the requirements, please send me an email with the 
same information as above
Incomplete application will not be considered. The AnDiAMO project:
Natural Language Processing (NLP) is a domain at the frontier of AI, computer 
science and linguistics, aiming at developing systems able to automatically 
analyze textual documents. Within NLP, discourse parsing is a crucial but 
challenging task: its goal is to produce structures describing the 
relationships (e.g. explanation, contrast...) between spans of text in full 
documents, allowing for making inference on their content. Developing 
high-performing and robust discourse parsers could help to improve downstream 
applications such as automatic summarization or translation, 
question-answering, chat bots, e.g. [1,2,3]. However, current performance are 
still low, mainly due to the lack of annotated data (see e.g. [4] on 
monologues, [5] on dialogues, [6,7] for the multilingual setting).
In order to develop robust discourse parsers within the AnDiAMO project, we 
want to explore multi-objective settings, where the goal is ultimately to 
perform a discourse analysis, but relying on another related objective such as 
performing well on another task (e.g. morphological, syntactic or temporal 
analysis), or an application (e.g. sentiment analysis or argument mining). We 
will also explore the issues of cross-language and cross framework learning.
The position is funded by the ANR AnDiAMO project, for which an engineer has 
already been hired, master interns will also be recruited. Collaborations are 
planned with researchers in Toulouse, Grenoble, Nancy and Munich. The hired 
person will be part of the MELODI team at IRIT, participating in team and 
project meetings, and co-authoring articles. Research plan:
The recruited candidate will work on one or several of the following topics, 
depending on its interests:
- Data representation: Discourse processing requires information from various 
levels of linguistics analysis. For now, existing studies do not make it clear 
what kind of information is important and needed, and some potentially relevant 
sources of information are ignored. We plan to explore this issue within a 
multi-task learning setting, where a system has to jointly learn different 
tasks. We will experiment on classification tasks (discourse relation, 
segmentation) and on full discourse parsing.
- Transferring to new languages, domains and modalities: Developing systems 
that perform well on domains or languages that are different from those used at 
training time is crucial, especially if the adaptation can be done in an 
unsupervised way. It is especially important for discourse, since annotation is 
very hard and time-consuming. We plan to experiment with cross-lingual 
embeddings and to explore multi-task learning, but trying to understand how to 
integrate additional linguistic information with only little annotated data for 
auxiliary tasks. We also want to investigate dialogues, for which only a few 
discourse parsers exist, and better understand how it differs from monologues.
- Extrinsic evaluation: We will investigate a few downstream applications that 
could benefit from discourse information, as a way to give an extrinsic 
evaluation. We will explore pipeline systems, varying the way we encode the 
discourse information as input of our end system. We will also explore transfer 
learning strategies, either via multi-task learning or representation learning. 
We plan to start with cognitive impairment detection (e.g. schizophrenia, 
Alzheimer) and argument mining. More applications will be considered, depending 
on the interest of the recruited postdoc.
It will be possible to investigate other paths of research, such as few-shot or 
unsupervised learning, depending on the interest of the recruited 
candidate.Profile
* PhD degree in computer science or computational linguistics
* Good knowledge in Machine Learning is required
* Interest in language technology / NLP
* Good programming skills: preferably with Python, knowledge of PyTorch is a 
plus References
[1] Feng, X., Feng, X., Qin, B., and Geng, X. Dialogue Discourse-Aware Graph 
Model and Data Augmentation for Meeting Summarization. In Proceedings of IJCAI. 
2019.
[2] Bawden, R., Sennrich, R., Birch, A., and Haddow, B. Evaluating Discourse 
Phenomena in Neural Machine Translation. In Proceedings of NAACL. 2018
[3] Xu, J., Gan, Z., Cheng, Y., & Liu, J. Discourse-Aware Neural Extractive 
Text Summarization. In Proceedings of ACL. 2020
[4] Koto, F., Lau, J. H., & Baldwin, T. Top-down Discourse Parsing via Sequence 
Labelling. In Proceedings of EACL. 2021
[5] Liu, Z., & Chen, N. Improving Multi-Party Dialogue Discourse Parsing via 
Domain Integration. In Proceedings of the 2nd Workshop on Computational 
Approaches to Discourse. 2021
[6] Braud, C., Coavoux, M., & Søgaard, A. Cross-lingual RST Discourse Parsing. 
In Proceedings of EACL. 2017[7] Liu, Z., Shi, K., & Chen, N. DMRST: A Joint 
Framework for Document-Level Multilingual RST Discourse Segmentation and 
Parsing. In Proceedings of the 2nd Workshop on Computational Approaches to 
Discourse. 2021
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to