Re: [Moses-support] tokenizer problem

tharaka weheragoda Wed, 30 May 2012 05:54:17 -0700

Thank you very much for your answer.But i'm new to this field and i'm not
aware about how to create nonbreaking_prefixfiles.Is there any perticular
way of doing this.Can you explain me something more.


On Wed, May 30, 2012 at 6:13 PM, Tom Hoar <
[email protected]> wrote:

> Build your own nonbreaking_prefixes file. Name it with the extension you
> want to use and save it in the nonbreaking_prefixes subfolder under the
> moses scripts/tokenizer folder. The existing files are commented with
> instructions to help you.
>
> Tom
>
>
>
> On Wed, 30 May 2012 17:37:19 +0530, tharaka weheragoda <
> [email protected]> wrote:
>
> Hi everybody,
>
>   When i'm trying to tokenize my sinhala dataset it gives me a warning
> message like this
>  "WARNING: No known abbreviations for language 'si', attempting fall-back
> to English version..."
>
> And my letters have changed a bit. Is their anyway to tokenize sinhala
> data with this tokenizer.perl ?
>
> I'm looking forward for your help.
>
> Thanks in advance!
> Tharaka
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] tokenizer problem

Reply via email to