--- apologies for cross-postings ---
Dear colleagues,
We have an open position for a postdoctoral researcher on natural
language processing / information retrieval / machine learning (SCAI/BnF
research program)
Starting period: autumn 2022
Duration: 12-month postdoctoral contract, renewable)
Location: Sorbonne university (ISIR lab in the MLIA team) / DataLab of
the BNF
Supervision:
Laure Soulier, MCF in computer science at Sorbonne University, MLIA
team, ISIR.
Emmanuelle Bermès, Scientific and Technical Assistant to the Director of
Services and Networks at BnF.
Jean-Philippe Moreux, Scientific expert of Gallica at the BnF.
More info:
https://scai.sorbonne-universite.fr/public/news/view/27d72d260c950c8d66c6/1
_*Context*_
Gallica, the digital library of the BnF, contains nearly 10 million
digitized documents that are freely accessible online (18.5 million
visits per year). However, most users do not know that Gallica contains
not only printed documents, but also photographs, sound recordings,
videos, and 3D objects. In satisfaction surveys, only a minority of
users consider the search engine's answers to be relevant and a majority
would like to be better guided in their searches. A recommendation
system should be able to help users find their way through the mass of
collections and improve the visibility of the least known. In this
project, BnF is committed to adopting a resolutely ethical approach. The
exploitation of user logs must respect their privacy and guarantee both
the relevance and transparency of the algorithms, avoiding the risk of
filter bubbles. The interface design is also at the heart of the
approach: a trustworthy system relies on a good user experience and on
the diversity and relevance of the proposed recommendations. Three lines
of thought emerge:
1) based on the available data, including both user logs and collection
descriptions, how to develop predictive algorithms?
2) how to integrate diversity in the recommendation algorithm while
leaving the choice to the user to moderate his serendipity threshold?
3) how to build user trust in algorithm design and audit?
_*Main missions*_
This project consists in working on information access in the Gallica
library, from the point of view of machine and deep learning techniques.
The research axes concern (1) the analysis and indexing of textual
documents as well as (2) the analysis of user traces and (3)
recommendation systems. We are particularly interested in multimodal
techniques that allow contextualizing a document or a query based on
user interactions.
The successful candidate will be responsible for:
● Implementing models to learn the semantics of textual data for the
purpose of indexing them.
● Developing algorithms based on representation learning methodologies
to effectively blend text and user traces.
● Reporting and presenting development work in a clear and effective
manner, both for discussion with BnF experts and writing machine
learning publications.
The printed book collection will be the primary focus of the program
described above, but an extension to other collections with textual
descriptors (in particular iconographic collections) may be considered.
--
-------------
Laure Soulier
Maître de conférences
Equipe MLIA - Laboratoire ISIR - Sorbonne Université
Tour 26, Couloir 26-00, Bureau 515
(+33) 1 44 27 74 91
https://pages.isir.upmc.fr/soulier/
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]