Build your own nonbreaking_prefixes file. Name it with the extension you want to use and save it in the nonbreaking_prefixes subfolder under the moses scripts/tokenizer folder. The existing files are commented with instructions to help you.
Tom On Wed, 30 May 2012 17:37:19 +0530, tharaka weheragoda wrote: Hi everybody, When i'm trying to tokenize my sinhala dataset it gives me a warning message like this "WARNING: No known abbreviations for language 'si', attempting fall-back to English version..." And my letters have changed a bit. Is their anyway to tokenize sinhala data with this tokenizer.perl ? I'm looking forward for your help. Thanks in advance! Tharaka
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
