There are two general ways to approach this requirement.

1) Prepare the source and target training corpus such that the inline tags do not split during the tokenize process. Train an SMT model with these tags. Finally, prepare your translation production work in the same manner. The SMT model will do it's best to place the tags during translation.

2) Strip the tags from your training corpus and train a model. Strip the tags for production work. Then parse the input/output pair to insert the tags in the target text.

The open source project Moses for Localization (M4Loc) has limited support for the second approach. Commercial solutions like DoMT from PTTools have more complete support for the second approach.

Tom



On 10/10/2014 12:53 AM, Alysson Andrade wrote:

Hello World,

I didn’t find this for myself, so I’m asking here. Probably it’s a common question, and other people also had it in the past.

1 - What can I do with inline tags in a file?

2 - And how can I “re-tag” the text, putting the tags in the correct words again?

Thanks guys

Alysson



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to