You could try this tutorial

http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf

On 14/08/15 20:20, Vincent Nguyen wrote:
> I had read this section, which deals with translation model combination.
> not much on language model or tuning.
>
> For instance : if I want to make sure that a specific expression
> "titres" is translated in "equities" from French to English.
>
> These 2 words have specifically to be in the Monolingual corpus of the
> language model, or in the parallel corpus ?
>
> the fact that 2 "parallel expressions" are in the tuning set but not
> present in the parallel corpora nor the monolingual LM, can it trigger a
> good translation ?
>
> I am not sure to be clear ....
>
> thanks again for your help.
>
>
> Le 14/08/2015 20:52, Rico Sennrich a écrit :
>> Hi Vincent,
>>
>> this section describes some domain adaptation methods that are
>> implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain
>>
>> It is incomplete (focusing on parallel data and the translation model),
>> and does not recommend best practices.
>>
>> In general, my recommendation is to use in-domain data whenever possible
>> (for the language model, translation model, and held-out in-domain data
>> for tuning/testing). Out-of-domain data can help, but also hurt your
>> system: the effect depends on your domains and the amount of data you
>> have for each. Data selection, instance weighting, model interpolation
>> and domain features are different methods that give you the benefits of
>> out-of-domain data, but reduce its harmful effects, and are often better
>> than just concatenating all the data you have.
>>
>> best wishes,
>> Rico
>>
>>
>> On 14/08/15 16:22, Vincent Nguyen wrote:
>>> Hi,
>>>
>>> I can't find a sort of "tutorial " on domain adaptation path to follow.
>>> I read this in the doc :
>>> The language model should be trained on a corpus that is suitable to the
>>> domain. If the translation model is trained on a parallel corpus, then
>>> the language model should be trained on the output side of that corpus,
>>> although using additional training data is often beneficial.
>>>
>>> And in the training section of the EMS, there is a sub section with
>>> domain-features=....
>>>
>>> What is the best practice ?
>>>
>>> Let's say for instance that I would like to specialize my modem in
>>> finance translation, with specific corpus.
>>>
>>> Should I train the Language model with finance stuff ?
>>> Should I include parallel corpus in the translation model training ?
>>> Should I tune with financial data sets ?
>>>
>>> Please help me to understand.
>>> Vincent
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to