1st Workshop on Creating Interoperable Corpora of Historical Newspapers 
(PressMint)
Second Call for Papers
Date: May 16, 2026, half-day workshop
Location: Palma de Mallorca, Spain
Website: https://www.clarin.eu/PressMint-LREC2026
Submission Deadline: 1 March 2025
Submission link:  https://softconf.com/lrec2026/PressMint/

Advertisement/Tagline
Unlock the pan-European history! Join the PressMint workshop to build & analyze 
multilingual, interoperable historical newspaper corpora!
Workshop description
Historical newspapers are of interest to historians and historical linguists, 
as well as to social and political scientists, ethnologists, anthropologists, 
media and communication scholars, and researchers in cultural studies. All of 
these are fields where contemporary digital resources, tools and methods (e.g. 
“distant reading”) are still underutilised. On the other hand, corpora of 
historical newspapers already exist for a number of languages and countries to 
a large extent, as they are out of copyright. Also, the images, and often OCR, 
are available through the national libraries. Also, in recent years these data 
started to be of big interest to the researchers since they preserve the 
historical, cultural, political, societal past. However, these corpora are not 
interoperable, which precludes methods for their comparison, as well as any 
translingual and transnational research, an especially important consideration, 
as statehood and nationhood are highly dynamic in Europe in the period to be 
covered by the project corpora. An initial joint attempt towards the creation 
of a corpus of historical newspapers from the beginning of 20. century on, is 
the CLARIN flagship project PressMint<https://www.clarin.eu/pressmint>. The 
project features data from 20 partners at the moment, aiming to develop a 
standard for interoperable resources of newspapers in diachronic timespans. The 
final goal is to provide structured and high quality multilingual data in a 
common format, with the same type of linguistic annotation that covers (at 
least partially) the same time period.
Objective
The PressMint workshop aims to gather experts interested in creating, 
processing and analyzing interoperable corpora of historical data in general, 
but especially with a focus on newspapers. Another very important objective is 
to consider also the perspective of the communities who use historical data - 
their purposes, requirements, feedback.
We encourage the interested colleagues to present their work on both types of 
levels – national and pan-European; monolingual and multilingual as well as 
task-specific and multidisciplinary. We view this workshop as a venue to 
exchange research ideas and start collaboration on this topic.

The workshop will feature one invited speaker: Maud Ehrmann, EPFL, CH
We invite unpublished original work focusing on (but not exclusive to) on the 
following topics:

  *
compilation, annotation, visualisation and utilisation of historical newspaper 
corpora of the period relevant to PressMint (ideally around the start of the 
20th century but not constrained by this period)
  *
harmonisation of the existing multilingual historical newspaper corpora that 
contain either synchronic or diachronic data, or both
  *
linking or comparing historical newspaper corpora with other datasets, 
including sources of structured knowledge, such as formal ontologies and LOD 
datasets
  *
enrichment of historical newspaper corpora (with e.g. sentiment annotation, 
etc.)
  *
machine translation of historical newspaper corpora
  *
employment of LLMs as stand alone tools or as parts of NLP architectures for 
historical data processing, maintenance and knowledge deployment.
  *
various scenarios of usage of historical data

Submission & Publication
We accept submission of long papers (from 6 to 8 pages), short papers (4 pages) 
and demo papers (4 pages) to be presented as a long or short oral presentation 
or poster presentations at the workshop. To support double-blind reviewing, all 
submissions must be fully anonymized and should be formatted according to the 
stylesheet available on the LREC 2026 
website<https://lrec2026.info/authors-kit/>. The papers of the workshop will be 
published in online proceedings.
At the time of submission, authors are also offered the opportunity to share 
related language resources with the community. All repository entries are 
linked to the LRE Map [https://lremap.elra.info/], which provides metadata for 
the resources.
Please note that the LREC style guide should be followed. The formatting 
guidelines can be found here: https://lrec2026.info/authors-kit/.
Important Dates

  *
Paper submission deadline:  1 March 2026
  *
Notification of acceptance: 15 March 2026
  *
Camera-ready paper: 30 March 2026
  *
Workshop date:  16 May 2026


Organizing Committee

  *
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, 
PL
  *
Tanja Wissik, Austrian Academy of Sciences, AT
  *
Petya Osenova, Sofia University ”St. Kl. Ohridski” & Bulgarian Academy of 
Sciences, BG


The workshop is supported by the CLARIN research infrastructure and the 
PressMint Project.
To contact the organisers, please email 
[email protected]<mailto:[email protected]>


_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to