Hi, the long answer is: - if you use a phrase-based model, you only need to escape the bar "|" - if you use XML markup, you are better off with escaping "<" and ">", and maybe also the quotes. - if you use the tree-based model, you are better off with escaping "[" and "]".
The short answer is: if you use the provided tokenizer, all this will be taken care of. If you use your own tokenizer, you can run the script escape-special-chars.perl afterwards and run deescape-special-chars.perl on the decoder output. -phi On Tue, Jun 12, 2012 at 11:27 AM, <[email protected]> wrote: > hi all, > > > > I tried to created a language& translation model via my private data sample. > > There are some control characters in the data, etc, ">","&", and what i want > to ask is whether these characters will affect the accuracy of the model > created by these data? > > Do have to delete all the control characters before using to create the > language& translation mode? > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
