Sanne Korzec wrote: > I am having trouble understanding what the recaser is doing exactly > when evaluating a (dev) test set. > > Why do we need to train a recaser?
Because the default setup in Moses is to train caseless models. This is done by lowercasing the parallel corpus before anything else happens. But this means that all ouput will be lowercase, which is ugly - users uniformly hate it. Plus, in the NIST evaluations, scoring is done casefully. The Moses recaser is a separate MT model that translates between the languages "lowercase english" and "mixed-CASE English". This is trained from a parallel corpus constructed from the lowercase version of the English, and the original English. > Is there some documentation about which arguments to give to train- > recaser.perl There's a little bit here: http://www.statmt.org/wmt08/baseline.html But its pretty minimal. > Why is there yet another moses.ini file here. I thought at this > stage we are finished training and thus we do not need the > moses.ini file anymore. Because the recaser is a completely separate set of Moses models. Even the language model is different - it's trained from the original English, while the "main" language model is trained from the lowercase English, to match what the main translation model wants to produce. It's worth noting that there are other ways to deal with translating case. You could simply leave the corpus unaltered, and train everything on caseful data. Then Moses would treat "burger" and "Burger" as completely unrelated words (same for "the" and "The", however). Or you could train a caseless translation model, but use a caseful language model to disambiguate between the possible case patterns for each word. There are a couple ways people have done the latter, by either using SRILM's disambiguate tool, or by hacking the phrase table to have every likely case pattern for each phrase. I think early versions of Moses used the former approach, while one of Google's entries in the NIST evals used the latter. Hope this lengthy explanation helped. - John Burger (not to be confused with john burger) MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
