Hi, I would suggest to use XML markup to specify translations for the place holders.
You can find some more information about this here: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4 -phi On Sun, Dec 11, 2011 at 6:46 AM, Daniel Schaut <[email protected]> wrote: > Hi all, > > At the moment I’m experimenting with corpus files that contain placeholders. > Since I’m not a very experienced user, I’d like to ask for some advice. Did > anyone already experimented with that? > > At first sight, I was thinking of removing all instances of placeholders, > but they make up around 10 % of the corpus files. So I’d like to keep them > for training, as in a lot of cases they would represent words, e.g.: > > Original text strings: > > See <ph x="1">{1}</ph> and <ph x="2">{2}</ph>. > > Removed markup: > > See {1} and {2}. > > When I’d remove the placeholders, the sentence structure gets obviously > broken. Broken sentences should be quite problematic, shouldn’t they? > > Other instances of placeholders appear to be meant inline elements, e. g. > > Select an <ph x="1">{1}</ph>option<ph x="2">{2}</ph> from the context menu. > > Select an {1}option{2} from the context menu. > > My strategy would be to add these placeholders to the list of non-breaking > prefixes in order to have them treated like words. Then setting the right > distortion value should do the trick, to keep them in place. Is this a good > idea? > > Best regards, > > Daniel > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
