Re: [Moses-support] Do i have delete the control characters in the corpus, before using it to create the language& translation model

Philipp Koehn Tue, 12 Jun 2012 03:48:28 -0700

Hi,

the long answer is:
- if you use a phrase-based model, you only need to escape the bar "|"
- if you use XML markup, you are better off with escaping "<" and ">",
  and maybe also the quotes.
- if you use the tree-based model, you are better off with escaping "[" and "]".


The short answer is: if you use the provided tokenizer, all this will be taken
care of. If you use your own tokenizer, you can run the script
escape-special-chars.perl
afterwards and run deescape-special-chars.perl on the decoder output.

-phi

On Tue, Jun 12, 2012 at 11:27 AM,  <[email protected]> wrote:
> hi all,
>
>
>
> I tried to created a language& translation model via my private data sample.
>
> There are some control characters in the data, etc, ">","&", and what i want
> to ask is whether these characters will affect the accuracy of the model
> created by these data?
>
> Do have to delete all the control characters before using to create the
> language& translation mode?
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Do i have delete the control characters in the corpus, before using it to create the language& translation model

Reply via email to