Hi Vincent,

this section describes some domain adaptation methods that are 
implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain

It is incomplete (focusing on parallel data and the translation model), 
and does not recommend best practices.

In general, my recommendation is to use in-domain data whenever possible 
(for the language model, translation model, and held-out in-domain data 
for tuning/testing). Out-of-domain data can help, but also hurt your 
system: the effect depends on your domains and the amount of data you 
have for each. Data selection, instance weighting, model interpolation 
and domain features are different methods that give you the benefits of 
out-of-domain data, but reduce its harmful effects, and are often better 
than just concatenating all the data you have.

best wishes,
Rico


On 14/08/15 16:22, Vincent Nguyen wrote:
> Hi,
>
> I can't find a sort of "tutorial " on domain adaptation path to follow.
> I read this in the doc :
> The language model should be trained on a corpus that is suitable to the
> domain. If the translation model is trained on a parallel corpus, then
> the language model should be trained on the output side of that corpus,
> although using additional training data is often beneficial.
>
> And in the training section of the EMS, there is a sub section with
> domain-features=....
>
> What is the best practice ?
>
> Let's say for instance that I would like to specialize my modem in
> finance translation, with specific corpus.
>
> Should I train the Language model with finance stuff ?
> Should I include parallel corpus in the translation model training ?
> Should I tune with financial data sets ?
>
> Please help me to understand.
> Vincent
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to