On 5/31/17 1:09 AM, Patrick Schluter wrote:
In any case, you can download the dataset from  if you like. There
are several 100 Mb big zip files containing a collection of tmx files
(translation memory exchange) with European Legislation. The files
contain multi-alignment texts in up to 24 languages. The files are
encoded in UCS-2 little-endian. I know for a fact (because I compiled
the data) that they don't contain characters outside of the BMP. The
data is public and can be used freely (as in beer).
When I get some time, I will try to port the java app that is
distributed with it to D (partially done yet).
Thanks, I'll bookmark it for later use.