Thank you for your help tom. I looked in to those files. By reading
existing files i didn't get any idea about how to write language specific
rules.Is there anybody to help me?

On Wed, May 30, 2012 at 6:39 PM, Tom Hoar <
[email protected]> wrote:

> When you compiled moses, it created a scripts folder. In there, you'll
> find the subfolders "scripts/tokenizer/nonbreaking_prefixes". The files in
> this folder all have the same name with a 2-letter language code extension.
> These file have language-specific rules for how the tokenizer & detokenizer
> work.
>
> Anyone, is there a better resource than reading the existing files to
> learn how the files work?
>
> Tom
>
>
>
> On Wed, 30 May 2012 18:22:52 +0530, tharaka weheragoda <
> [email protected]> wrote:
>
> Thank you very much for your answer.But i'm new to this field and i'm not
> aware about how to create nonbreaking_prefixfiles.Is there any perticular
> way of doing this.Can you explain me something more.
>
> On Wed, May 30, 2012 at 6:13 PM, Tom Hoar <
> [email protected]> wrote:
>
>> Build your own nonbreaking_prefixes file. Name it with the extension you
>> want to use and save it in the nonbreaking_prefixes subfolder under the
>> moses scripts/tokenizer folder. The existing files are commented with
>> instructions to help you.
>>
>> Tom
>>
>>
>>
>> On Wed, 30 May 2012 17:37:19 +0530, tharaka weheragoda <
>> [email protected]> wrote:
>>
>> Hi everybody,
>>
>>   When i'm trying to tokenize my sinhala dataset it gives me a warning
>> message like this
>>  "WARNING: No known abbreviations for language 'si', attempting fall-back
>> to English version..."
>>
>> And my letters have changed a bit. Is their anyway to tokenize sinhala
>> data with this tokenizer.perl ?
>>
>> I'm looking forward for your help.
>>
>> Thanks in advance!
>> Tharaka
>>
>>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to