* Title: Diving into neural language models for improving discourse
analysis tasks
* Keywords: Neural Language Models, Discourse analysis, Argumentative
structure, Probing, Transfer Learning
* Supervisors: [email protected] and
[email protected]
* Location: TALN@LS2N, Nantes, France - https://taln-ls2n.github.io
* Starting date: Jan-2023 (flexible) ~6 months
* Opportunity: to pursue a PhD in the Lexhnology ANR project
https://www.ls2n.fr/stage-these/diving-into-neural-language-models-for-improving-discourse-analysis-tasks
# MISSION
Fine-tuning a pre-trained language model has become the de facto
standard for handling natural language processing tasks. Since many of
these tasks are dealing with discourse and dialogue structures (e.g.
conversational agent, summarization, dialogue acts recognition,
argumentation mining), it is crucial to understand how such information
is captured by the language models and to study how to intervene on the
learning of this type of information: what is learned, what is missing,
how to add it, how to keep the useful information in a fine-tuned,
distilled, pruned or quantized model...
The internship mission will be defined in this context, collaboratively
with the candidate. One possibility would be to start by probing the
language models on discourse analysis tasks.
We wish the successful candidate to pursue a PhD on the subject in the
Lexhnology project.
* A. Rogers, O. Kovaleva, and A. Rumshisky. A Primer in BERTology: What
We Know About How BERT Works. Transactions of the Association for
Computational Linguistics (TACL), 8:842–866. 2020.
* V. Araujo, A. Villa, M. Mendoza, M.-F. Moens, and A. Soto,
“Augmenting BERT-style Models with Predictive Coding to Improve
Discourse-level Representations,” In EMNLP, Nov. 2021.
* M. Lukasik, B. Dadachev, G. Simões, & K. Papineni, Text Segmentation
by Cross Segment Attention, In Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing (EMNLP), 4707–4716,
November 16–20, 2020.
* L. Huber, C. Memmadi, M. Dargnat, and Y. Toussaint. Do sentence
embeddings capture discourse properties of sentences from scientific
abstracts ? In the First ACL Workshop on Computational Approaches to
Discourse, 86–95, 2020.
* F. Koto, J. H. Lau, and T. Baldwin. Discourse Probing of Pretrained
Language Models. In Proceedings of the 20th Conference of the North
American Chapter of the Association for Computational Linguistics
(NAACL), Mexico (virtual), 2021
# THE LEXHNOLOGY PROJECT
Lexhnology is a project funded by the French National Agency (ANR). It
will start on January 2, 2023 for a period of 42 months.
Given the growing extraterritoriality of American law, this domestic law
is increasingly impacting other countries' jurisdiction. It is of prime
importance that second-language (L2) users of legal English be able to
analyze case law. Teaching the argumentative structure to L2 learners is
a widely accepted method in languages for specific purposes (LSP) L2
teaching/learning and may help learners understand the legally-binding
rationale behind judicial decisions.
Despite this context, consensus about the linguistic definition of the
communicative functions, also known as moves, in case law does not yet
exist. In addition, no Natural Language Processing (NLP) techniques are
currently able to automatically identify moves in case law. Finally, the
effectiveness of making moves explicit to L2 learners has not been
measured experimentally.
To answer these questions, Lexhnology will take an innovative
interdisciplinary approach – linguistic, NLP, LSP teaching/learning.
The project is the joint collaboration of four laboratories, namely
LS2N, CRINI, LAIRDIL and ATILF.
# APPLICATION
The successful candidate is expected to:
* Have/Prepare a Master Degree (or equivalent) in Natural Language
Processing, Computer Sciences, Computational Linguistics or Data
sciences,
* Have a excellent background in deep learning and more generally
machine learning,
* Have strong programming skills (software dev. and python)
* Have good verbal communication and writing skills (in French/English)
* Have facility with teamwork as well as working autonomously
* Be dynamic and curious
We look forward to receiving your meaningful online application
including:
* a letter of motivation
* a CV
* contacts for two references
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]