Thank you for your help tom. I looked in to those files. By reading existing files i didn't get any idea about how to write language specific rules.Is there anybody to help me?
On Wed, May 30, 2012 at 6:39 PM, Tom Hoar < [email protected]> wrote: > When you compiled moses, it created a scripts folder. In there, you'll > find the subfolders "scripts/tokenizer/nonbreaking_prefixes". The files in > this folder all have the same name with a 2-letter language code extension. > These file have language-specific rules for how the tokenizer & detokenizer > work. > > Anyone, is there a better resource than reading the existing files to > learn how the files work? > > Tom > > > > On Wed, 30 May 2012 18:22:52 +0530, tharaka weheragoda < > [email protected]> wrote: > > Thank you very much for your answer.But i'm new to this field and i'm not > aware about how to create nonbreaking_prefixfiles.Is there any perticular > way of doing this.Can you explain me something more. > > On Wed, May 30, 2012 at 6:13 PM, Tom Hoar < > [email protected]> wrote: > >> Build your own nonbreaking_prefixes file. Name it with the extension you >> want to use and save it in the nonbreaking_prefixes subfolder under the >> moses scripts/tokenizer folder. The existing files are commented with >> instructions to help you. >> >> Tom >> >> >> >> On Wed, 30 May 2012 17:37:19 +0530, tharaka weheragoda < >> [email protected]> wrote: >> >> Hi everybody, >> >> When i'm trying to tokenize my sinhala dataset it gives me a warning >> message like this >> "WARNING: No known abbreviations for language 'si', attempting fall-back >> to English version..." >> >> And my letters have changed a bit. Is their anyway to tokenize sinhala >> data with this tokenizer.perl ? >> >> I'm looking forward for your help. >> >> Thanks in advance! >> Tharaka >> >> >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
