There are two general ways to approach this requirement.
1) Prepare the source and target training corpus such that the inline
tags do not split during the tokenize process. Train an SMT model with
these tags. Finally, prepare your translation production work in the
same manner. The SMT model will do it's best to place the tags during
translation.
2) Strip the tags from your training corpus and train a model. Strip the
tags for production work. Then parse the input/output pair to insert the
tags in the target text.
The open source project Moses for Localization (M4Loc) has limited
support for the second approach. Commercial solutions like DoMT from
PTTools have more complete support for the second approach.
Tom
On 10/10/2014 12:53 AM, Alysson Andrade wrote:
Hello World,
I didn’t find this for myself, so I’m asking here. Probably it’s a
common question, and other people also had it in the past.
1 - What can I do with inline tags in a file?
2 - And how can I “re-tag” the text, putting the tags in the correct
words again?
Thanks guys
Alysson
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support