Hi 

It's difficult to answer this question without more specific information, but 
yes, in general it is possible for techniques to work on one corpus and fail 
on another. Changing language pair, genre, corpus size etc. could all change 
the characteristics of an smt system,

regards
Barry

On Wednesday 09 Jun 2010 13:00:39 Mark Fishel wrote:
> > Just in case, let me tell you that there seems to be several corpora (and
> > acquis) published by the JRC corpora. The one I was referring to in my
> > previous message can be downloaded here:
> > http://langtech.jrc.it/DGT-TM.html#Download .
> 
> The corpus that I meant is the JRC-Acquis Multilingual Parallel Corpus
> (http://langtech.jrc.it/JRC-Acquis.html).
> 
> I wasn't talking so much about technical difficulties or corpus text
> bugs, and I'm aware of Koehn/Birch/Steinberger paper "462 Machine
> Translation Systems for Europe" -- but rather, has anyone had
> unexpected conclusions on this corpus, for instance something (like a
> method of improving the SMT output) that worked on, say, Europarl and
> didn't work on the same (or other) language pairs on the JRC-Acquis
> parallel corpus?
> 
> Thanks in advance,
> Mark & Heiki
> 
> >> Dear readers,
> >>
> >> we keep getting strange, unexpected and sometimes illogical results in
> >> more than one series of SMT experiments using the JRC Acquis parallel
> >> corpus. Often the same methods work fine on Europarl. Our question is
> >
> > Hi Mark,
> >
> > We have been using *extensively* the JRC acquis corpus and I can assure
> > you that we had no big problems. Some colleagues, who have used the
> > program that comes with the corpus, did have some slight problems. I have
> > chosen to unzip the several volumes manually and never had them. For this
> > as well as for other corpora, some characters can derail the training. We
> > have developed Moses for Mere Mortals
> > (http://code.google.com/p/moses-for-mere-mortals/), that provides a
> > Windows add-in (Extract_TMX_Corpus) that helps to clean such things and
> > creates corpora that you can directly feed to Moses (UTF-8, Linux
> > newlines, removal of control characters and so on). Therefore, I can
> > assure you that the JRC acquis definitively works. It seems me that the
> > Moses team has already published data about their experiments with this
> > corpus. It covers most, if not all, the language pairs of the European
> > Union, what is a plus.
> >
> > Greetings,
> >
> > João
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to