Hi, i dont know if it is the default setting and there is an option to change that, but the tokenizer script is ignoring hyphens (-) and I would like it to separate them as different tokens, so that for example: "high-energy" is tokenized as "high - energy" and not just one token as it is doing now...
thanks for your help.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
