Hi, i dont know if it is the default setting and there is an option to
change that, but the tokenizer script is ignoring hyphens (-) and I would
like it to separate them as different tokens, so that for example:
"high-energy" is tokenized as "high - energy" and not just one token as it
is doing now...

thanks for your help.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to