Re: [Moses-support] Training of LM and TM containing placeholders

Philipp Koehn Mon, 12 Dec 2011 14:12:25 -0800

Hi,

I would suggest to use XML markup to specify translations
for the place holders.


You can find some more information about this here:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4

-phi

On Sun, Dec 11, 2011 at 6:46 AM, Daniel Schaut <[email protected]> wrote:
> Hi all,
>
> At the moment I’m experimenting with corpus files that contain placeholders.
> Since I’m not a very experienced user, I’d like to ask for some advice. Did
> anyone already experimented with that?
>
> At first sight, I was thinking of removing all instances of placeholders,
> but they make up around 10 % of the corpus files. So I’d like to keep them
> for training, as in a lot of cases they would represent words, e.g.:
>
> Original text strings:
>
> See <ph x="1">{1}</ph> and <ph x="2">{2}</ph>.
>
> Removed markup:
>
> See {1} and {2}.
>
> When I’d remove the placeholders, the sentence structure gets obviously
> broken. Broken sentences should be quite problematic, shouldn’t they?
>
> Other instances of placeholders appear to be meant inline elements, e. g.
>
> Select an <ph x="1">{1}</ph>option<ph x="2">{2}</ph> from the context menu.
>
> Select an {1}option{2} from the context menu.
>
> My strategy would be to add these placeholders to the list of non-breaking
> prefixes in order to have them treated like words. Then setting the right
> distortion value should do the trick, to keep them in place. Is this a good
> idea?
>
> Best regards,
>
> Daniel
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Training of LM and TM containing placeholders

Reply via email to