Hi Cyrine,

The tokenizer.perl escapes some reserved characters so they don't cause problems with the Moses decoder. You should use tokenizer.perl on your source and target training data. During translation runtime use the tokenizer.perl on your source input and detokenizer.perl on your translated output. The detokenizer.perl script un-escapes the codes and you finish with the correct characters.

Tom



On 06/18/2014 09:04 PM, Cyrine NASRI wrote:
Hello
I have concern the tonkenizer script,

When i so the tokenization, i got some """ and "'".. wHen i let them in the training process i think it damage the translation quality?
So should i really let them or transform them to " and ' after training.

Thank you in advance for your reply

Best Cyrine


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to