[Apologies for cross-posting]

Terminology Translation Task at WMT2025 - Call for Participation

We are excited to announce the third Shared Task on Terminology 
Translation<https://www2.statmt.org/wmt25/terminology.html>, which would be run 
within the 10th Conference on Machine Translation (WMT2025) in Suzhou, China.

TL;DR:
- We test the sentence-level and document-level translation of the texts in 
finance and IT domains, given the explicit terminology.
- The language pairs are: English -> {Spanish, German, Russian, Chinese}, 
Chinese -> English.
- We evaluate the overall quality of translation, terminology success rate and 
consistency. Additionally, we compare the performance of systems given no terms 
provided, proper terminology and random terms.
 - The task starts on 20th June 2025 AOE, the submission deadline is 20th July 
2025 AOE.
- Please pre-register via Google Forms here: 
https://forms.gle/ZSn2pNJkQJAzHFnA6 .

OVERVIEW

The advances in neural MT and LLM-assisted translation of the last decade show 
nearly human quality in general domain translation at least for the 
high-resource languages. However, when it comes to specialized domains like 
science, finance, or legal texts, where the correct and consistent use of 
special terms is crucial, the task is far from being solved. The Terminology 
Shared Task aims to assess the extent to which machine translation models can 
utilize additional information regarding the translation of terminologies. 
Compared to two previous editions, 2021 and 2023, the new test data have more 
various test cases, are more consistent in domains for each translation 
direction, and are broader in language coverage.

TASK DESCRIPTION

Track №1: Sentence/Paragraph-Level Translation

You will be provided with sequence of input sentences long, and small 
terminology dictionaries that will correspond only to the terms present in the 
given sentence.

Language Pairs:

  *   en-de (English → German)
  *   en-ru (English → Russian)
  *   en-es (English → Spanish)

Domain: information technology

Track №2: Document-Level Translation

The setup is similar to Track №1, with two exceptions: the length of the input 
texts now equals the document, and the dictionaries correspond to the whole set 
of input texts (i.e. they are corpus-level). This makes the task close to the 
real-life setup (where the dictionaries exist independently from the texts), 
while it may complicate the implementation (since for the solutions that 
require storing the whole dictionary it will take more memory). Additionally, 
for the whole document setup, the problem of the consistent usage of terms is 
becoming more important.

Language Pairs:
en-zh-Hant (English → Traditional Chinese)
zh-Hant-en (Traditional Chinese → English)

Domain: finance

EVALUATION

Terminology Modes:
You are expected to compare your system’s performance under three modes:

1. No terminology: the system is only provided with input sentences/documents.
2. Proper terminology: the system is provided with input texts (same as 1.) and 
dictionaries of the format {source_term: target_term}.
3. Random terminology: the system is provided with input texts and translation 
dictionaries of the same format as in 2. The difference is that the dictionary 
items are not special terms but words randomly drawn from input texts. This 
mode is of special interest since we want to measure to what extent the proper 
term translations help to improve the system performance (2.), as opposed to an 
arbitrary broader input that does not contain the domain-specific terminology.

Metrics:

1. Overall Translation Quality: we will evaluate the general aspects of machine 
translation outputs such as fluency, adequacy and grammaticality. We will do 
that with the general MT automatic metrics such as BLEU or COMET. In addition 
to that, we will pay special attention to the grammaticality of the translated 
terms.
2. Terminology Success Rate: This metric assesses the ability of the system to 
accurately translate technical terms given the specialized vocabulary. This 
will be carried out by comparing the occurrences of the correct term 
translations (i.e. the ones present in the dictionary) to the output terms. The 
goal is to have a higher success rate that will show adherence to dictionary 
translations.
3. Terminology Consistency: for domains such as science or legal texts, the 
consistent use of an introduced term throughout the text is crucial. In other 
words, we want a system to not only pick up a correct term in a target language 
but to use it consistently once it is chosen. This will be evaluated by 
comparing all translations of a given source term in a text and measuring the 
percentage of deviations from the most consistent translation. This metric is 
more important for the Document-Level track, but it will be used for both 
tracks.

IMPORTANT DATES
All dates are end of Anywhere on Earth (AoE).

Data snippets released: 7th May 2025
Dev data released: 22nd May 2025
Test data release, task starts: 20th June 2025 (postponed)
Submission deadline: 20th July 2025 (postponed)
Paper submission to WMT25: in-line with WMT25
Camera-ready submission to WMT25: in-line with WMT25
Conference in Suzhou, China: 05-09 November 2025

SUBMISSION GUIDELINES

0. Please notify us about your participation prior to submission. This is 
optional, but will be very helpful for us for better understanding of our 
workload after submission. Please do it through this Google Form: 
https://forms.gle/ZSn2pNJkQJAzHFnA6
1. Check your submission files with the validation script. It will be published 
at test date publication.
2. Write a description of your system (optional).
3. Submit your system via Google Forms. The Google form with all necessary 
submission details will be published at the test set date.

All details on submission as well as FAQ can be found at the webpage of the 
shared task.

ORGANIZERS

  *   Kirill Semenov (University of Zurich), main contact: FirstNаmе [dоt] 
LаstNаmе {аt} uzh /dоt/ ch
  *   Nathaniel Berger (Heidelberg University)
  *   Pinzhen Chen (University of Edinburgh & Aveni.ai)
  *   Xu Huang (Nanjing University)
  *   Arturo Oncevay (JP Morgan)
  *   Dawei Zhu (Amazon)
  *   Vilém Zouhar (ETH Zurich)

WEBSITE: https://www2.statmt.org/wmt25/terminology.html

In case of query, please send an email to Kirill Semenov (see email above).

Reply via email to