Re: [Moses-support] Tokenizer script

Tom Hoar Wed, 18 Jun 2014 07:18:52 -0700

Hi Cyrine,

The tokenizer.perl escapes some reserved characters so they don't causeproblems with the Moses decoder. You should use tokenizer.perl on yoursource and target training data. During translation runtime use thetokenizer.perl on your source input and detokenizer.perl on yourtranslated output. The detokenizer.perl script un-escapes the codes andyou finish with the correct characters.


Tom



On 06/18/2014 09:04 PM, Cyrine NASRI wrote:

Hello
I have concern the tonkenizer script,
When i so the tokenization, i got some """ and "'".. wHen ilet them in the training process i think it damage the translationquality?
So should i really let them or transform them to " and ' after training.

Thank you in advance for your reply

Best Cyrine


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tokenizer script

Reply via email to