On 5/31/17 1:09 AM, Patrick Schluter wrote:
In any case, you can download the dataset from [1] if you like. There
are several 100 Mb big zip files containing a collection of tmx files
(translation memory exchange) with European Legislation. The files
contain multi-alignment texts in up to 24 languages. The files are
encoded in UCS-2 little-endian. I know for a fact (because I compiled
the data) that they don't contain characters outside of the BMP. The
data is public and can be used freely (as in beer).
When I get some time, I will try to port the java app that is
distributed with it to D (partially done yet).

[1]:
https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory

Thanks, I'll bookmark it for later use.

-Steve

Reply via email to