On Thursday, 17 May 2018 at 15:37:01 UTC, Andrei Alexandrescu wrote:
On 05/17/2018 09:14 AM, Patrick Schluter wrote:
I'm in charge at the European Commission of the biggest translation memory in the world.

Impressive! Is that the Europarl?

No, Euramis. The central translation memory developed by the Commission and used also by the other institutions. The database contains more than a billion segments from parallel texts and is afaik the biggest of its kind. One of the big strength of the Euramis TM is its multi-target language store this allows fuzzy searches in all combinations including indirect translations (i.e. if a document written in english was translated in Romanian and in Maltese it is then possible to search for alignments between ro and mt). It's not the only system to do that but on that volume it is quite unique. We publish also every year an extract of it of the published legislation [1] from the official journal so that they can be used by the research community. All the machine translation engines use it. It is one of most accessed data collection on the European Open Data portal [2].

The very uncommon thing about the backend software of EURAMIS is that it is written in C. Pure unadultered C. I'm trying to introduce D but with the strange (to say it politely) configurations our server have it is quite challenging.

[1]: https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory
[2]: http://data.europa.eu/euodp/fr/data

Reply via email to