Build your own nonbreaking_prefixes file. Name it with the extension
you want to use and save it in the nonbreaking_prefixes subfolder under
the moses scripts/tokenizer folder. The existing files are commented
with instructions to help you. 

Tom 

On Wed, 30 May 2012 17:37:19
+0530, tharaka weheragoda  wrote:  

Hi everybody,

 When i'm trying to
tokenize my sinhala dataset it gives me a warning message like this 

"WARNING: No known abbreviations for language 'si', attempting fall-back
to English version..."

And my letters have changed a bit. Is their
anyway to tokenize sinhala data with this tokenizer.perl ?

I'm looking
forward for your help.

Thanks in advance!
Tharaka 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to