> Just in case, let me tell you that there seems to be several corpora (and > acquis) published by the JRC corpora. The one I was referring to in my > previous > message can be downloaded here: http://langtech.jrc.it/DGT-TM.html#Download .
The corpus that I meant is the JRC-Acquis Multilingual Parallel Corpus (http://langtech.jrc.it/JRC-Acquis.html). I wasn't talking so much about technical difficulties or corpus text bugs, and I'm aware of Koehn/Birch/Steinberger paper "462 Machine Translation Systems for Europe" -- but rather, has anyone had unexpected conclusions on this corpus, for instance something (like a method of improving the SMT output) that worked on, say, Europarl and didn't work on the same (or other) language pairs on the JRC-Acquis parallel corpus? Thanks in advance, Mark & Heiki >> Dear readers, >> >> we keep getting strange, unexpected and sometimes illogical results in >> more than one series of SMT experiments using the JRC Acquis parallel >> corpus. Often the same methods work fine on Europarl. Our question is > > Hi Mark, > > We have been using *extensively* the JRC acquis corpus and I can assure you > that > we had no big problems. Some colleagues, who have used the program that comes > with the corpus, did have some slight problems. I have chosen to unzip the > several volumes manually and never had them. For this as well as for other > corpora, some characters can derail the training. We have developed Moses for > Mere Mortals (http://code.google.com/p/moses-for-mere-mortals/), that > provides a > Windows add-in (Extract_TMX_Corpus) that helps to clean such things and > creates > corpora that you can directly feed to Moses (UTF-8, Linux newlines, removal of > control characters and so on). Therefore, I can assure you that the JRC acquis > definitively works. It seems me that the Moses team has already published data > about their experiments with this corpus. It covers most, if not all, the > language pairs of the European Union, what is a plus. > > Greetings, > > João > > > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
