Hi Cyrine,
The tokenizer.perl escapes some reserved characters so they don't cause
problems with the Moses decoder. You should use tokenizer.perl on your
source and target training data. During translation runtime use the
tokenizer.perl on your source input and detokenizer.perl on your
translated output. The detokenizer.perl script un-escapes the codes and
you finish with the correct characters.
Tom
On 06/18/2014 09:04 PM, Cyrine NASRI wrote:
Hello
I have concern the tonkenizer script,
When i so the tokenization, i got some """ and "'".. wHen i
let them in the training process i think it damage the translation
quality?
So should i really let them or transform them to " and ' after training.
Thank you in advance for your reply
Best Cyrine
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support